Book 107 - Burny

Large language models are using variations of gradient descent to minimize loss function over predicting tons of text data with artificial neural networks, attention, and other tricks in the Transformer architecture! [Transformer (deep learning architecture) - Wikipedia](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) (now often with better routing of information thanks to mixture of experts architecture [Mixture of experts - Wikipedia](https://en.wikipedia.org/wiki/Mixture_of_experts) )" which means tons of inscrutable matrices with trillions of emergent patterns in dynamics which we mathematically understand insufficiently. Reverse engineering what happens inside and controllability is being solved by the whole mechanistic interpretability field [GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.](<https://github.com/JShollaj/awesome-llm-interpretability> [Concrete open problems in mechanistic interpretability | Neel Nanda | EAG London 23 - YouTube](https://www.youtube.com/watch?v=7t9umZ1tFso) [Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability - YouTube](https://www.youtube.com/watch?v=2Rdp9GvcYOE) ), or [Statistical learning theory](<https://en.wikipedia.org/wiki/Statistical_learning_theory>) and deep learning theory field, [[2106.10165] The Principles of Deep Learning Theory](<https://arxiv.org/abs/2106.10165> A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets [A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets - YouTube](https://www.youtube.com/watch?v=m2bXL5Z5CBM) ), or other alignment and empirical alchemical methods [[2309.15025] Large Language Model Alignment: A Survey](<https://arxiv.org/abs/2309.15025>) Now the biggest limitations in current AI systems are probably: to create more complex systematic coherent reasoning, planning, generalizing, search, agency (autonomy), memory, factual groundedness, online/continuous learning, software and hardware energetic and algoritmic efficiency, human-like ethical reasoning, or controllability, into AI systems, which they have relatively weak for more complex tasks, but we are making progress in this, either through composing LLMs in multiagent systems, scaling, higher quality data and training, poking around how they work inside and thus controlling them, through better mathematical models of how learning works and using these insights, or modified or overhauled architecture, etc.... or embodied robotics is also getting attention recently... and all top AGI labs are working/investing in these things to varying degrees. Here are some works: Survey of LLMs: [[2312.03863] Efficient Large Language Models: A Survey](<https://arxiv.org/abs/2312.03863>), [[2311.10215] Predictive Minds: LLMs As Atypical Active Inference Agents](<https://arxiv.org/abs/2311.10215>), [A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications](<https://arxiv.org/abs/2402.07927>) Reasoning: [Human-like systematic generalization through a meta-learning neural network | Nature](<https://www.nature.com/articles/s41586-023-06668-3>), [[2305.20050] Let's Verify Step by Step](<https://arxiv.org/abs/2305.20050>), [[2302.00923] Multimodal Chain-of-Thought Reasoning in Language Models](<https://arxiv.org/abs/2302.00923>), [[2310.09158] Learning To Teach Large Language Models Logical Reasoning](<https://arxiv.org/abs/2310.09158>), [[2303.09014] ART: Automatic multi-step reasoning and tool-use for large language models](<https://arxiv.org/abs/2303.09014>), [AlphaGeometry: An Olympiad-level AI system for geometry - Google DeepMind](<https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/>) (Devin AI programmer [Cognition | Introducing Devin, the first AI software engineer](https://www.cognition-labs.com/introducing-devin) ) (Automated Unit Test Improvement using Large Language Models at Meta [[2402.09171] Automated Unit Test Improvement using Large Language Models at Meta](https://arxiv.org/abs/2402.09171) ) (GPT-5: Everything You Need to Know So Far [GPT-5: Everything You Need to Know So Far - YouTube](https://www.youtube.com/watch?v=Zc03IYnnuIA) ), (Self-Discover: Large Language Models Self-Compose Reasoning Structures [[2402.03620] Self-Discover: Large Language Models Self-Compose Reasoning Structures](https://arxiv.org/abs/2402.03620) [x.com](https://twitter.com/ecardenas300/status/1769396057002082410) ) , (How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning [x.com](https://twitter.com/fly51fly/status/1764279536794169768?t=up6d06PPGeCE5fvIlE418Q&s=19) [[2402.18312] How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning](https://arxiv.org/abs/2402.18312) ), [Magic](http://magic.dev) , (The power of prompting [Your request has been blocked. This could be due to several reasons.](https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/) ), Flow engineering ( https://www.codium.ai/blog/alphacodium-state-of-the-art-code-generation-for-code-contests/ ), Stable Cascade ( [Introducing Stable Cascade — Stability AI](https://stability.ai/news/introducing-stable-cascade) ), ( RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners [[2403.12373] RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners](https://arxiv.org/abs/2403.12373) ) Robotics: [Mobile ALOHA - A Smart Home Robot - Compilation of Autonomous Skills - YouTube](<[Mobile ALOHA - A Smart Home Robot - Compilation of Autonomous Skills - YouTube](https://www.youtube.com/watch?v=zMNumQ45pJ8>),) [Eureka! Extreme Robot Dexterity with LLMs | NVIDIA Research Paper - YouTube](<[Eureka! Extreme Robot Dexterity with LLMs | NVIDIA Research Paper - YouTube](https://youtu.be/sDFAWnrCqKc?si=LEhIqEIeHCuQ0W2p>),) [Shaping the future of advanced robotics - Google DeepMind](<https://deepmind.google/discover/blog/shaping-the-future-of-advanced-robotics/>), [Optimus - Gen 2 - YouTube](<[Optimus - Gen 2 | Tesla - YouTube](https://www.youtube.com/watch?v=cpraXaw7dyc>),) [Atlas Struts - YouTube](<https://www.youtube.com/shorts/SFKM-Rxiqzg>), [Figure Status Update - AI Trained Coffee Demo - YouTube](<[Figure Status Update - AI Trained Coffee Demo - YouTube](https://www.youtube.com/watch?v=Q5MKo7Idsok>),) [Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks - YouTube](<[Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks - YouTube](https://www.youtube.com/watch?v=Qob2k_ldLuw>)) Multiagent systems: [[2402.01680] Large Language Model based Multi-Agents: A Survey of Progress and Challenges](<https://arxiv.org/abs/2402.01680>) (AutoDev: Automated AI-Driven Development [[2403.08299] AutoDev: Automated AI-Driven Development](https://arxiv.org/abs/2403.08299) ) Modified/alternative architectures: [Mamba (deep learning architecture) - Wikipedia](<https://en.wikipedia.org/wiki/Mamba_(deep_learning_architecture)>), [[2305.13048] RWKV: Reinventing RNNs for the Transformer Era](<https://arxiv.org/abs/2305.13048>), [V-JEPA: The next step toward advanced machine intelligence](<https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/>), [Active Inference](<https://mitpress.mit.edu/9780262045353/active-inference/>) Agency: [[2305.16291] Voyager: An Open-Ended Embodied Agent with Large Language Models](<https://arxiv.org/abs/2305.16291>), [[2309.07864] The Rise and Potential of Large Language Model Based Agents: A Survey](<https://arxiv.org/abs/2309.07864>), [Agents | Langchain](<https://python.langchain.com/docs/modules/agents/>), [GitHub - THUDM/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)](<https://github.com/THUDM/AgentBench>), [[2401.12917] Active Inference as a Model of Agency](<https://arxiv.org/abs/2401.12917>), [CAN AI THINK ON ITS OWN? - YouTube](<[The Free Energy Principle approach to Agency - YouTube](https://www.youtube.com/watch?v=zMDSMqtjays>),) [Artificial Curiosity Since 1990](<https://people.idsia.ch/~juergen/artificial-curiosity-since-1990.html>) Factual groundedness: [[2312.10997] Retrieval-Augmented Generation for Large Language Models: A Survey](<https://arxiv.org/abs/2312.10997>), [Perplexity](<https://www.perplexity.ai/>), [ChatGPT - Consensus](<https://chat.openai.com/g/g-bo0FiWLY7-consensus>) Memory: larger context window [Gemini 10 million token context window](<[x.com](https://twitter.com/mattshumer_/status/1759804492919275555>),) or [vector databases](<https://en.wikipedia.org/wiki/Vector_database>) (Larimar: Large Language Models with Episodic Memory Control [[2403.11901] Larimar: Large Language Models with Episodic Memory Control](https://arxiv.org/abs/2403.11901) ) Hardware efficiency: extropic [Ushering in the Thermodynamic Future - Litepaper](https://www.extropic.ai/future) , tinygrad, groq [x.com](https://twitter.com/__tinygrad__/status/1769388346948853839) , ['A single chip to outperform a small GPU data center': Yet another AI chip firm wants to challenge Nvidia's GPU-centric world — Taalas wants to have super specialized AI chips | TechRadar](https://www.techradar.com/pro/a-single-chip-to-outperform-a-small-gpu-data-center-yet-another-ai-chip-firm-wants-to-challenge-nvidias-gpu-centric-world-taalas-wants-to-have-super-specialized-ai-chips) , new Nvidia GPUs [NVIDIA Just Started A New Era of Supercomputing... GTC2024 Highlight - YouTube](https://www.youtube.com/watch?v=GkBX9bTlNQA) , etched [Etched | The World's First Transformer ASIC](https://www.etched.com/) , https://techxplore.com/news/2023-12-ultra-high-processor-advance-ai-driverless.html , Thermodynamic AI and the fluctuation frontier [[2302.06584] Thermodynamic AI and the fluctuation frontier](https://arxiv.org/abs/2302.06584) , analog computing [x.com](https://twitter.com/dmvaldman/status/1767745899407753718?t=Xe5sDPbrBVayUaAGX4ikmw&s=19) neuromorphics [Neuromorphic engineering - Wikipedia](https://en.wikipedia.org/wiki/Neuromorphic_engineering) , [Homepage | Cerebras](https://www.cerebras.net/) Online/continuous learning: [Online machine learning - Wikipedia](https://en.wikipedia.org/wiki/Online_machine_learning) (A Comprehensive Survey of Continual Learning: Theory, Method and Application [[2302.00487] A Comprehensive Survey of Continual Learning: Theory, Method and Application](https://arxiv.org/abs/2302.00487) ) Meta learning: [Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)) (Paired open-ended trailblazer (POET) [Paired open-ended trailblazer (POET) - Alper Ahmetoglu](https://alpera.xyz/blog/1/) ) Planning: [[2402.01817] LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks](<https://arxiv.org/abs/2402.01817>), [[2401.11708v1] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs](<https://arxiv.org/abs/2401.11708v1>), [[2305.16151] Understanding the Capabilities of Large Language Models for Automated Planning](<https://arxiv.org/abs/2305.16151>) Generalizing: [[2402.10891] Instruction Diversity Drives Generalization To Unseen Tasks](<https://arxiv.org/abs/2402.10891>), [Automated discovery of algorithms from data | Nature Computational Science](<https://www.nature.com/articles/s43588-024-00593-9>), [[2402.09371] Transformers Can Achieve Length Generalization But Not Robustly](<https://arxiv.org/abs/2402.09371>), [[2310.16028] What Algorithms can Transformers Learn? A Study in Length Generalization](<https://arxiv.org/abs/2310.16028>), [[2307.04721] Large Language Models as General Pattern Machines](<https://arxiv.org/abs/2307.04721>), [A Tutorial on Domain Generalization | Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining](<https://dl.acm.org/doi/10.1145/3539597.3572722>), [[2311.06545] Understanding Generalization via Set Theory](<https://arxiv.org/abs/2311.06545>), [[2310.08661] Counting and Algorithmic Generalization with Transformers](<https://arxiv.org/abs/2310.08661>), [Neural Networks on the Brink of Universal Prediction with DeepMind's Cutting-Edge Approach | Synced](<https://syncedreview.com/2024/01/31/neural-networks-on-the-brink-of-universal-prediction-with-deepminds-cutting-edge-approach/>), [[2401.14953] Learning Universal Predictors](<https://arxiv.org/abs/2401.14953>), [Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks | Nature Communications](<https://www.nature.com/articles/s41467-021-23103-1>) (Natural language instructions induce compositional generalization in networks of neurons [Natural language instructions induce compositional generalization in networks of neurons | Nature Neuroscience](https://www.nature.com/articles/s41593-024-01607-5) ) (FRANCOIS CHOLLET - measuring intelligence and generalisation [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) [x.com](https://twitter.com/fchollet/status/1763692655408779455) [#51 FRANCOIS CHOLLET - Intelligence and Generalisation - YouTube](https://youtu.be/J0p_thJJnoo) ) (Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking](https://arxiv.org/abs/2403.09629) ) Search: AlphaGo ( [x.com](https://twitter.com/polynoamial/status/1766616044838236507) ), AlphaCode 2 Technical Report ( https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf ) It is quite possible (and a large % of researchers think) that research trying to control these crazy inscrutable matrices does not have sufficiently rapid development compared to capabilities research (increasing the amount of things these systems are capable of) and we might see more and more cases where AI systems do pretty random things we didnt intended. Then we have no idea how to turn off behaviors with existing methods [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training \ Anthropic](<https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training>), which could be seen lately the last few days with how GPT4 started outputting total chaos after an update [OpenAI's ChatGPT Went Completely Off the Rails for Hours](<https://www.thedailybeast.com/openais-chatgpt-went-completely-off-the-rails-for-hours>), Gemini was more woke than intended ( [Google Has a New 'Woke' AI Problem With Gemini - Business Insider](https://www.businessinsider.com/google-gemini-woke-images-ai-chatbot-criticism-controversy-2024-2) [The self-unalignment problem — AI Alignment Forum](https://www.alignmentforum.org/posts/9GyniEBaN3YYTqZXn/the-self-unalignment-problem) ), or every moment I see a new jailbreak that bypasses the barriers [[2307.15043] Universal and Transferable Adversarial Attacks on Aligned Language Models](<https://arxiv.org/abs/2307.15043>). Regarding definitions of AGI, this is good from DeepMind [Levels of AGI: Operationalizing Progress on the Path to AGI](https://arxiv.org/abs/2311.02462), or I also like, although quite vague, a pretty good definition from OpenAI: Highly autonomous systems that outperform humans at most economically valuable work, or this is a nice thread of various definitions and their pros and cons [9 definitions of Artificial General Intelligence (AGI) and why they are flawed](<[x.com](https://twitter.com/IntuitMachine/status/1721845203030470956>),) or also [Universal Intelligence: A Definition of Machine Intelligence](<https://arxiv.org/abs/0712.3329>), or Karl Friston has good definitions [KARL FRISTON - INTELLIGENCE 3.0](<[KARL FRISTON - INTELLIGENCE 3.0 - YouTube](https://youtu.be/V_VXOdf1NMw?si=8sOkRmbgzjrkvkif&t=1898>))) In terms of predictions when AGI arrives, people around Effective Accelerationism, Singularity, Metaculus, LessWrong/Effective Altruism, and various influential people in top AGI labs, have very short timelines, often possibly in the 2020s. [Singularity Predictions 2024 by some people big in the field](https://www.reddit.com/r/singularity/comments/18vawje/singularity_predictions_2024/kfpntso/), [Metaculus: When will the first weakly general AI system be devised, tested, and publicly announced?](<[Date Weakly General AI is Publicly Known | Metaculus](https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/>)) Then there is also this questionnaire about priorities and predictions from AI researchers, whose intervals are shrinking by about half each year in these questionnaires: [AI experts make predictions for 2040. I was a little surprised. | Science News](<[AI experts make predictions for 2040. I was a little surprised. | Science News - YouTube](https://www.youtube.com/watch?v=g7TghURVC6Y>),) [Thousands of AI Authors on the Future of AI](https://arxiv.org/abs/2401.02843) When someone calls LLMs "just statistics", then you may similarly reductively say that humans are "just autocompleting predictions about input signals that are compared to actual signals" (using a version of bayesian inference) [Predictive coding](<https://en.wikipedia.org/wiki/Predictive_coding> [Visual processing - Wikipedia](https://en.wikipedia.org/wiki/Visual_processing) [Free energy principle - Wikipedia](https://en.wikipedia.org/wiki/Free_energy_principle) Inner screen model of consciousness: applying free energy principle to study of conscious experience [Inner screen model of consciousness: applying free energy principle to study of conscious experience - YouTube](https://www.youtube.com/watch?v=yZWjjDT5rGU&pp=ygUzZnJlZSBlbmVyZ3kgcHJpbmNpcGxlIGFwcGxpZWQgdG8gdGhlIGJyYWluIHJhbXN0ZWFk)) (global neuronal workspace theory + integrated information theory + recurrent processing theory + predictive processing theory + neurorepresentationalism + dendritic integration theory, An integrative, multiscale view on neural theories of consciousness https://www.cell.com/neuron/fulltext/S0896-6273%2824%2900088-6 ) (Models of consciousness Wikipedia [Models of consciousness - Wikipedia](https://en.wikipedia.org/wiki/Models_of_consciousness?wprov=sfla1) ) (More models https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8146510/ ) or "just bioelectricity and biochemistry" ( [Bioelectric networks: the cognitive glue enabling evolutionary scaling from physiology to mind | Animal Cognition](https://link.springer.com/article/10.1007/s10071-023-01780-3) ) (Bioelectric networks: the cognitive glue enabling evolutionary scaling from physiology to mind) or "just particles" ( https://en.wikipedia.org/wiki/Electromagnetic_theories_of_consciousness) (On Connectome and Geometric Eigenmodes of Brain Activity: The Eigenbasis of the Mind? [On Connectome and Geometric Eigenmodes of Brain Activity: The Eigenbasis of the Mind?](https://qri.org/blog/eigenbasis-of-the-mind) ) (Integrated world modeling theory [Frontiers | An Integrated World Modeling Theory (IWMT) of Consciousness: Combining Integrated Information and Global Neuronal Workspace Theories With the Free Energy Principle and Active Inference Framework; Toward Solving the Hard Problem and Characterizing Agentic Causation](https://www.frontiersin.org/articles/10.3389/frai.2020.00030/full) [Integrated world modeling theory expanded: Implications for the future of consciousness - PubMed](https://pubmed.ncbi.nlm.nih.gov/36507308/) ) (Can AI think on its own? [The Free Energy Principle approach to Agency - YouTube](https://youtu.be/zMDSMqtjays?si=MRXTcQ6s8o_KwNXd) ) (Synthetic Sentience: Can Artificial Intelligence become conscious? | Joscha Bach [Synthetic Sentience: Can Artificial Intelligence become conscious? | Joscha Bach | CCC #37c3 - YouTube](https://youtu.be/Ms96Py8p8Jg?si=HYx2lf8DrCkMcf8b) ). Or you can say that the whole universe is just a big differential equation. It doesn't really tell you specific things about concrete implementation details and about the dynamics that's actually happening there! [The Mystery of Spinors - YouTube](https://www.youtube.com/watch?v=b7OIbMCIfs4) [Building a GENERAL AI agent with reinforcement learning - YouTube](https://youtu.be/s3C0sEwixkQ?si=XQ0-QqEU8OTuS6fA) [The Butterfly Effect is Much Worse Than We Thought - YouTube](https://youtu.be/V5R6VLUUHRs?si=DTuPdAsbFEdmX-rZ) [What Is a Quantum Field Theory?](https://www.cambridge.org/core/books/what-is-a-quantum-field-theory/899688E515D7E05AAA88DB08325E6EAE) [[2403.12021] A tweezer array with 6100 highly coherent atomic qubits](https://arxiv.org/abs/2403.12021) [Michel Talagrand Wins Abel Prize for Work Wrangling Randomness | Quanta Magazine](https://www.quantamagazine.org/michel-talagrand-wins-abel-prize-for-work-wrangling-randomness-20240320/) [Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity | OpenReview](https://openreview.net/forum?id=ux9BrxPCl8) [x.com](https://twitter.com/jackm2003/status/1770221903661437086?t=qLZMcVHcXsEuePja5wP6vw&s=19) https://openreview.net/pdf?id=U_T8-5hClV [Artificial intelligence and illusions of understanding in scientific research | Nature](https://www.nature.com/articles/s41586-024-07146-0) [The case for open source AI](https://press.airstreet.com/p/the-case-for-open-source-ai) Tool Use in LLMs survey [x.com](https://twitter.com/omarsar0/status/1770497515898433896?t=cZLFtMkVTSK9iwRrE4PEmw&s=19) [Connor <> Beff Debates — The Society Library](https://www.societylibrary.org/connor-beff-debates) [Accelerating toward the speed of light - YouTube](https://youtube.com/watch?v=0uunSMipnxA) https://arxiv.org/abs/2403.09613 Science is about finding as compatible as possible models that with the smallest amount of bits as possible compress as much information as possible to predict as many phenomena as possible I have chronic hypercuriousitia