Book 32 - Burny

Will we be like ants to AGI Designing LLMs and learning algorithm is like evolution, we still dont undestand what it creates AI advertazial beating AI in go https://twitter.com/ARGleave/status/1587875100732182530?t=n9onDW0z2c5JslTWTw9HrA&s=19 Decentralized opensource LLMs https://fxtwitter.com/markopolojarvi/status/1727143362082504771?t=0goBesh4M888THcsQK_Kvg&s=19 https://fxtwitter.com/markopolojarvi/status/1727126385511350476?t=91QRmV7dwbkmJScdfN3lQQ&s=19 Decentralized minds want decentralized societies. Centralized minds want centralized societies. Shallow brain hypothesis [How deep is the brain? The shallow brain hypothesis | Nature Reviews Neuroscience](https://www.nature.com/articles/s41583-023-00756-z) [[2311.10770] Exponentially Faster Language Modelling](https://arxiv.org/abs/2311.10770) 0.3% of neurons needed for same performance GPTs graph neural network AI healthcare news: https://www.cell.com/cell-systems/fulltext/S2405-4712(23)00298-3 [Oxford-led study shows how AI can detect antibiotic resistance in as little as 30 minutes | University of Oxford](https://www.ox.ac.uk/news/2023-11-21-oxford-led-study-shows-how-ai-can-detect-antibiotic-resistance-little-30-minutes) [Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer | Nature Medicine](https://www.nature.com/articles/s41591-023-02625-9) [[2311.00117] BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B](https://arxiv.org/abs/2311.00117) BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B [[2311.12315] AcademicGPT: Empowering Academic Research](https://arxiv.org/abs/2311.12315) AcademicGPT: Empowering Academic Research [Are quantum mechanics, classical physics and relativistic physics mutually incompatible? - Quora](https://www.quora.com/Are-quantum-mechanics-classical-physics-and-relativistic-physics-mutually-incompatible) https://qph.cf2.quoracdn.net/main-qimg-d0d19ddcb3e36335773fc2aa586bb468 bridges between QM, GR, CM The physics of immortality, life has to be present to the final singularity omega point [TEDx Brussels 2010 - Frank Tipler - The Ultimate Future - YouTube](https://www.youtube.com/watch?v=tNkuJvhyfP0) Universe doesnt expand forever but ends in final singularity: Unitarity holds. Black holes exist. If black holes exist and universe expands forever, then unitarity would be violated. And since unitarity cannot be violated, universe cannot expand forever. Barriers to unlimited communication back and forth across the universe cannot exist is mathematically event horizons cannot exist: If event horizons did exist, and relativistic quantum mechanics holds, then second law of thermodynamics is violated, which is not possible, therefore event horizons cannot exist. If universe has no event horizons, then it has to be spacially closed - final singularity has to be a special type of singularity - a single pointlike structure, the very end of time, the ultimate future, the omega point. Life has to be present all the way into the omega point. Life's power and knowledge must increase without limit as the omega point is approached. Event horizons absence in itself would be violation of the second law of thermodynamics, unless the universe is actually guided through an infinite number of very special states. This is possible only if life is present and the knowledge of life approaches infinity. Knowledge is information stored in a computer memory. Life is a special type of a computer aka finite state machine. Since knowledge becomes infinite, the information we store in our computers is diverging to infinity. Quantum mechanics proves that humans in the entire visible universe are finite state machines and that the upper bound of the complexity of the universe is finite number 10^10^123, which encodes all possible visible universes in the computers of the far future. Life in the far future could by computers code a perfect simulation of the entire universe using an insignificant fraction of the total computer memory in the far future, which would resurrect us. After the resurrection we could have infinite number of new experiences, all of them, as computer emulation, never to die again. Eternal life won it all. it is likely that future life will resurrect us because we're trying to resurrect our own ultimate ancestor - the single living cell. Omega point is a state outside of the universe of infinite power and knowledge, equivalent to the Judeo-Christian God. Relativity, quantum mechanics, and standard model are all special case of classical mechanics. Intersection of all of these gives us reality. There are no more unknown laws, we already have a theory of everything. Quantum gravity wouldn't change it. We can test that we will live eternally, that all these claims hold, like the acceleration of the universe not stopping, which would wipe us all out, with special kinds of measurements on the comic microwave background radiation. Then we will know we can trust these laws of physics proving immortality of life, but we already have no evidence experimentally that anything is wrong with them, but let's have some more evidence. Would increase of entropy stop if we turned everything in the universe into ideal reversible computing computers [Reversible computing - Wikipedia](https://en.wikipedia.org/wiki/Reversible_computing) Reconstruction of all from quantum flucations at the end [We Did The Math - You Are Dead! - YouTube](https://www.youtube.com/watch?v=4Stzj2_Rlo4) [Conformal cyclic cosmology - Wikipedia](https://en.wikipedia.org/wiki/Conformal_cyclic_cosmology) Adaptation and resistance acceleration Safe growth to mitigate not only natural existential risks Low cost computing technology, conserving energy Energy from stars, from black holes Hibernating during periods of low energy availability Cracking sentience longevity and immortality Time travelling Manipulating physical constants Finding loophole in second law of thermodynamics Maybe stopping (ideal reversible computing) or reversing entropy through new physics Being reborn (conformal cyclic cosmology, quantum fluctuations in the end reconstructing everything, omega point cosmology) Quantum immortality Accessing different multiverses or dimension if it exists Nonphysical Maxwell's Demon Different models of identity - closed (nothing after death, heaven after death, merging with Universe hyperorganism/God/AGI, reincarnation), empty (there is only now), open individualism (there's no individuality), their merging Space and time was born "at" the "begging" of the universe and collapses "at" the "end" of the universe Space, time, local identity - all illusions, immortality by default It stops being philosophy once it works https://twitter.com/omarsar0/status/1727358484360945750 [[2311.12351] Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey](https://arxiv.org/abs/2311.12351) Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey, upgrading attention, memory,... visualizing quantum superposition and decoherence https://upload.wikimedia.org/wikipedia/commons/transcoded/8/8b/Quantum_superposition_of_states_and_decoherence.ogv/Quantum_superposition_of_states_and_decoherence.ogv.1080p.vp9.webm Sicne complexity only increases by the second law of thermodynamics, more nad more complex forms of life will emerge [A social path to human-like artificial intelligence | Nature Machine Intelligence](https://www.nature.com/articles/s42256-023-00754-x) A social path to human-like artificial intelligence For true superintelligence you need flexibility. Combining the machinery of general and narrow intelligence might be the path to flexible both general and narrow superintelligence! Gemini uses AlphaZero-based MCTS through chains of thought GPT5 uses similarly Q* https://www.lesswrong.com/posts/JnM3EHegiBePeKkLc/possible-openai-s-q-breakthrough-and-deepmind-s-alphago-type [[2311.13110] White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?](https://arxiv.org/abs/2311.13110) [White-Box Transformers via Sparse Rate Reduction](https://ma-lab-berkeley.github.io/CRATE/) White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? Approximation of Solomon induction is all you need for AGI [Shane Legg (DeepMind Founder) - 2028 AGI, Superhuman Alignment, New Architectures - YouTube](https://youtu.be/Kc1atfJkiJU?si=-3MG4pHRUDWY0xe7) 14:00 [Solomonoff's theory of inductive inference - Wikipedia](https://en.m.wikipedia.org/wiki/Solomonoff%27s_theory_of_inductive_inference) https://societylibrary.medium.com/so-what-should-we-do-about-ai-lets-count-the-ways-c383cefe0c55 https://twitter.com/ninamiolane/status/1727456767049887969 🌐Information geometry (IG) uses differential geometry to study probability theory, statistics, & machine learning. IG explores statistical manifolds whose points are to probability distributions. Jailbreaks on SOTA LLMs https://twitter.com/soroushjp/status/1721950722626077067 https://imgur.com/RT6BZKI AGI IS BUNCH OFIF STATEMENTS,IT'S 100% JUST A STOCHASTIC PARROT Gary Marcus NOOO YOU NEED COMPLEX HIEARCHIES OF ABSTRACTION, A HETEROGENEOUS SYMBOLIC COGNITIVE ARCHITECTURE, A WORLD MODEL, FINE TUNED OPTIMIZATION ALGORITHMS, BRAINLIKE SOFTWARE AND HARDWARE TO ACHIEVE AGI Yan Lecunn's architecture, maybe Friston FEP, neuromorphic engineering, maybe Qualia research institute - might be opposite TRANSFORMERS IS ALL YOU NEED FOR AGI Illya https://twitter.com/burny_tech/status/1725578088392573038 ,ITS GROKKING CIRCUITS [A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) - YouTube](https://www.youtube.com/watch?v=ob4vuiqG2Go,) SCALING LAWS ARE EXPLAINEDBY DATA FRACTAL MANIFOLD DIMENSIONS [Scaling Laws from the Data Manifold Dimension](https://jmlr.org/papers/v23/20-1111.html) and [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features/index.html) TRANSFORMERS WITH Q* OR MCTS IS ALL YOU NEED FOR AGI, https://www.lesswrong.com/posts/JnM3EHegiBePeKkLc/possible-openai-s-q-breakthrough-and-deepmind-s-alphago-type https://twitter.com/ylecun/status/1727736289103880522 NEW ARCHITECTURES OR HYBRID APPROACHES CAN GIVE GENERAL OR NARROW ALGORITHMIC SPEEDUP, WE STILL DON'T KNOW HOW IT WORKS,EVERYONE WHO CLAIMS THAT HE KNOWS HOW IT WORKS IS LYING,WE DONT KNOW THE LIMITS OF EMERGENT CAPABILITIES AS WE SCALE TO INFINITY, https://twitter.com/NPCollapse/status/1726998388821115350 also synthetic data Transcending the transcendental: The whole universe is just stochastic parrot hyperorganism made of if statements, or Fourier. [Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | Papers With Code](https://paperswithcode.com/paper/language-agent-tree-search-unifies-reasoning) https://twitter.com/finbarrtimbers/status/1727741345983574439 accelerate inference with transformers - the KV cache - speculative decoding - Jacobi decoding - lookahead decoding https://twitter.com/lmsysorg/status/1727056892671950887 - activation sparsity - quantization https://www.lesswrong.com/tag/aixi The AIXI formalism says roughly to consider all possible computable models of the environment, Bayes-update them on past experiences, and use the resulting updated predictions to model the expected sensory reward of all possible strategies. This is an application of Solomonoff Induction. Working memory: recent things Cortical memory: cortex, knowledge Episodic memory (hippocampus): learning specific things very rapidly, missing in current AGI systems and benchmarks, related to sample effeciency [Shane Legg (DeepMind Founder) - 2028 AGI, Superhuman Alignment, New Architectures - YouTube](https://youtu.be/Kc1atfJkiJU?si=2MB_QsOYMbVZNY3k) Measuring video comprehension is also missing in AGI benchmarks Universal intelligence definition [Universal Intelligence: A Definition of Machine Intelligence | Minds and Machines](https://link.springer.com/article/10.1007/s11023-007-9079-x) [Shane Legg (DeepMind Founder) - 2028 AGI, Superhuman Alignment, New Architectures - YouTube](https://youtu.be/Kc1atfJkiJU?si=A0sfkdDv99X0wyjn) 8:00 kologomov complexity (which is noncomputable [ChatGPT](https://chat.openai.com/share/da013049-2556-4408-9642-c1f200bc3bcb) on tasks under particular reference machine (ideally humanliie task environment) That you can guide using reinforcement learning by searching in the statespace of possible programs [AIXI - Wikipedia](https://en.m.wikipedia.org/wiki/AIXI) https://twitter.com/johnjnay/status/1727815271459803270 -Biology, physics, chemistry experts wrote 448 questions -Highly skilled non-expert humans spending 30+ mins w/ unrestricted web search: 34% accuracy -GPT-4: 39% -PhDs in corresponding domains: 65% [[2303.14151] Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle](https://arxiv.org/abs/2303.14151) Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle generate different responces, apply reflection tokens [[2310.11511] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511) (trainable by selfevaluation, feedback from human or artificial agents), reprompt or vector steer in inference, [Steering GPT-2-XL by adding an activation vector — AI Alignment Forum](https://www.alignmentforum.org/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector) [Representation Engineering: A Top-Down Approach to AI Transparency](https://www.ai-transparency.org/) according to their score (creating giant database of them), or changing weights, try again until reflections tokens are as happy as possible sounds like more abstract gradient descent, reflections tokens being a more abstract cost function, vector steering being update https://fxtwitter.com/_akhaliq/status/1727530418168025337 "GAIA, an alternative benchmark for General AI Assistants, human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins" pod hodně definicema a benchmarkama AGI co lítaly před pár lety by se dnešní systémy za AGI už považovaly příjde mi čím víc se to zlepšuje tím víc se posouvají definice AGI za chvíli AGI bude jenom ta co dokáže vyřešit reimannovu hypotézu a kvantovou gravitaci v jedný rovnici LLMs are more statistically efficient, even when its less size and energy efficients [CBMM10 Panel: Research on Intelligence in the Age of AI - YouTube](https://www.youtube.com/watch?v=Gg-w_n9NJIE) There's no reason AI can't have our strong bayesian prior [CBMM10 Panel: Research on Intelligence in the Age of AI - YouTube](https://www.youtube.com/watch?v=Gg-w_n9NJIE) Is embodiment needed for AGI? Are current AI systems missing causation, for which we need experimenting? [CBMM10 Panel: Research on Intelligence in the Age of AI - YouTube](https://youtu.be/Gg-w_n9NJIE?si=nZAetoV0pnEK4Ko7&t=659) Is planning, factuality needed for AGI [CBMM10 Panel: Research on Intelligence in the Age of AI - YouTube](https://youtu.be/Gg-w_n9NJIE?si=aGS41rLVV1QAHYZ1&t=1574) Degree of agency of an agent is determined by the size of the space of actions he can pursue to achieve his goals that have causal power over himself and the environment AI is glorified compression https://twitter.com/ChombaBupe/status/1727713732359229676 I appreciate this and I think it can give us tons of insight and more predictive power about how machine learning works and how to design more efficient machine learning architectures, but I think saying "It's just glorified statistics, compression, stochastic parrot, probability, multivariable calculus, linear algebra, Hopfield networks,..." and not caring about other levels of abstraction is not right and is like saying all of physics or biology is "just evolution", "just interacting atoms", or "just statistics", "just quantum field theory" etc. In practice we need to predict and explain why LLMs or machine learning architectures in general with scale get emergent capabilities like some degree of theory of mind, mostly consistent syntax, some degree of chain of thought, weaker mathematical skills, connecting concepts somewhat sensibly, etc., or for example why they can write Startrek in the style of Shakespeare, or all the other unpredictable emergent capabilities in big models! In small models we have some mechanistic interpretability work, like Anthropic's decomposition of features in one layer transformer using sparse autoencoders finding concrete finite state automata for composing HTML, or Neel Nanda's reverse engineering of one layer transformer's learned grokked "discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle" circuit to compute modular addition, but we got nothing of this type for models of the size of GPT4 for so many so far mysterious capabilities! Saying that the brain is statistics, evolution, and atoms, doesn't help us predict the human capabilities we are looking for in language models, or it also doesn't help us predict things like depression, in brains - we still struggle to find depression there, and we are trying tons of tools - identifying neurotransmitters, analyzing structure and activity of the various subnetworks with harmonics, topological analysis or network and graph theory, statistics, analyzing psychological and external factors etc., and we still know so little! Similarly in machine learning, we don't have predictive models for so many of the capabilities, or overall high level patterns that were learned in the networks using the learning algorithm, therefore we can't predict what other grokked circuits or emergent capabilities we get as we scale more or modify the architecture! Machine learning is still mostly just empirical alchemy! Experience is universe looking at its own eyeballs I would argue that GPT4 is already better than 50 percentile of people at many concrete tasks, when you take a random sample of the human population. But still lots of tasks at which its worse than children. I feel like this line is slowly blurring and will blur faster and faster. We're still better at mathematics or walking. Its way smarter somewhere, but way dumber in other places, so far. [Constitutional AI: Harmlessness from AI Feedback \ Anthropic](https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback) Safety: Metody jsou mechanistic interpretability, redteaming, evaluating dangerous capabilities, process supervision https://openai.com/research/improving-mathematical-reasoning-with-process-supervision ideální by bylo evaluating all conseauences of actions in a world model, robustly hardwire ethics from start pak víc obecně institutions and governance ASI: 👉essentially never hallucinate 👉reliably reason over abstractions 👉can form long term plans 👉understand causality 👉reliably maintain models of the world 👉reliably handle outliers Chain of Hindsight Aligns Language Models with Feedback [[2302.02676] Chain of Hindsight Aligns Language Models with Feedback](https://arxiv.org/abs/2302.02676) [Q* - Clues to the Puzzle? - YouTube](https://youtu.be/ARf0WyFau0A?si=cfp17bD5xgl8ZaF0&t=891) feature selection [Q* - Clues to the Puzzle? - YouTube](https://youtu.be/ARf0WyFau0A?si=cfp17bD5xgl8ZaF0&t=888) Transformer systém 2 attention https://twitter.com/jaseweston/status/1726784511357157618?t=Pee9O1P4NCN_rwENzd_rhw&s=19 Transformers plus GNNs Transformers plus recurence in depth [Q* - Clues to the Puzzle? - YouTube](https://youtu.be/ARf0WyFau0A?si=cfp17bD5xgl8ZaF0&t=888) [[2310.04406] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models](https://arxiv.org/abs/2310.04406) Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Philosophers/cognitive scientists: Nooo current AIs wont do x because it can't Y! AGI scientists: Transformers with Q* goes brrrt. [[2206.01078] Deep Transformer Q-Networks for Partially Observable Reinforcement Learning](https://arxiv.org/abs/2206.01078) Deep Transformer Q-Networks for Partially Observable Reinforcement Learning [OpenAI's Q* is the BIGGEST thing since Word2Vec... and possibly MUCH bigger - AGI is definitely near - YouTube](https://youtu.be/3d0kk88IE8c?si=QcCQRELEJaLHchCL&t=630) [[2109.08236] Reinforcement Learning on Encrypted Data](https://arxiv.org/abs/2109.08236) Reinforcement Learning on Encrypted Data Selfpruning irrelevant connections Can you implement recursive selfimprovement by recursively creating a simulation of your architecture and a modification of it and if modification of it is better at some task then use it to override oneself V teorii chaosu je asi největší věc to že malá změna v počátečních podmínkách může dát drasticky odlišný výsledky, a když máš konečnou přesnost a výpočetní výkon při modelování, tak spoustu věcí neprediktneš. Existuje ale spousta zákonů, který tento chaos kompresují a reprezentují vzory co nám dokáží předpovídat různé věci s různou přesností. To že máme různé differenciální rovnice v chemii nebo biologii, jako zákony na vyšší úrovni abstrakci nad extrémně komplexní chaotickou fundamentální fyzikou je mega cool. Ale hledat zákony ve společnosti, počasí a klima je dost těžký, ale něco máme. Často se tam používají různý numerický a statistický metody, fluid dynamics jsou super. Hází se na to AI která bývá lepší než klasický modely. https://ct24admin.ceskatelevize.cz/veda/3631915-umela-inteligence-uz-umi-predpovidat-pocasi-dela-skvele-ale-dopousti-se-i-hrubych-chyb [ECMWF | Charts](https://charts.ecmwf.int/?facets=%7B%22Product%20type%22%3A%5B%22Experimental%3A%20Machine%20learning%20models%22%5D%7D) https://phys.org/news/2023-09-artificial-intelligence-climate.html AI computing feynman diagrams [François Charton | Transformers for maths, and maths for transformers - YouTube](https://youtu.be/Sc6k06wVX3s?si=Oz545XT5qX_rq5aM) complex systems, information theory, cybernetics, cognitive manifolds, teal spiral dynamics, embodiment, collective consciousness, machine learning, sentience, living systems theory, somatic experiencing, thermodynamic evolution, chaos magic, computer science, decomposition, strategy, active inference, surprisal, polycomputing, sociotechnology, psychotechnology, technological evolution “Quadratic attention has been indispensable for information-dense modalities such as language... until now. Announcing Mamba: a new SSM arch. that has linear-time scaling, ultra long context, and most importantly--outperforms Transformers everywhere we've tried.” [GitHub - state-spaces/mamba: Mamba SSM architecture](https://github.com/state-spaces/mamba) scott aarson [Scott Aaronson: The Greatest Unsolved Problem in Math - YouTube](https://www.youtube.com/watch?v=1ZpGCQoL2Rk) I have become mathematics, the structure of all possible realities