Book 31 - Burny

https://twitter.com/abacaj/status/1721223737729581437 There was one paper in Google recently that people interpreted as transformers being limited by them not generalizing out of their distribution, but this interpretation got tons of criticism, pointing at it being a weaker claim of finding of limited evidence that the concrete model they are using can generalize (GPT2 sized model trained on sequences of pairs rather than on natural language (GPT4 or GPT5 are much bigger and more succesful with more capabilities)) and the fact that on grokking there is a lot of reseach, where the model actually generalizes by learning a specific algorithm, instead of just memorization, which let's it go beyond its training distribution, and we don't know the limits of this, in particular in such big models like GPT4. I believe there is a probability curve for each learned generalization or capability being emergent, so, the less the generalization or capability potentially contributes to predict the training data, the less probable the local minima where it is approximated or fully grokked is. Or for example XOR/NAND is turing complete, which some neural nets learn, or we found finite state automata in transformers, and there are tons of other constructions which are turning complete, which means they can be composed into computing arbitrary functions and therefore predict arbitrary functions, like our computers - If the learning algorithm groks that, well, then it might be fully general, but the question of efficiency is another part. There may be some turing complete very or fully general computationally efficient set of grokkable reasoning patterns that might emerge in AGI in big enough transformers trained on diverse enough data. Some hardcoded architecture, hardcoded priors, hardcoded symmetries, as geometric deep learning studies, might get us there much faster. I am agnostic until I see some mathematical proof that lots of layers of transformers can't in general such big generalizations and get such emergent capabilities. https://fxtwitter.com/PicoPaco17/status/1721224107386142790 https://fxtwitter.com/BlackHC/status/1721328041341694087 https://twitter.com/stanislavfort/status/1721444678686425543 https://twitter.com/MishaLaskin/status/1721280919984844900 https://twitter.com/curious_vii/status/1721240963144724736 https://twitter.com/benrayfield/status/1721235971360850015 https://twitter.com/ReedSealFoss/status/1721230127218950382 https://twitter.com/emollick/status/1721324981215261040 https://twitter.com/bayeslord/status/1721291821391736884 https://twitter.com/deliprao/status/1721579247687361011 https://twitter.com/aidan_mclau/status/1721347001168629761 https://twitter.com/QuanquanGu/status/1721349163844325611 https://twitter.com/deliprao/status/1721579247687361011 https://twitter.com/VikrantVarma_/status/1699823229307699305 [Neel Nanda](https://www.neelnanda.io/) [Mechanistic Interpretability - NEEL NANDA (DeepMind) - YouTube](https://www.youtube.com/watch?app=desktop&v=_Ygf0GnlwmY) [William Merrill: Transformers are Uniform Constant Depth Threshold Circuits - YouTube](https://www.youtube.com/watch?v=WU9RSiTw4R8) transformer interpretability turing complete but can solve only shallow problems [[2106.10165] The Principles of Deep Learning Theory](https://arxiv.org/abs/2106.10165) Mathematical Principles of Deep Learning Theory https://twitter.com/GaryMarcus/status/1724869848772042934 𝘓𝘪𝘵𝘦𝘳𝘢𝘭𝘭𝘺 out of control: “first demonstration of Large Language Models trained to be helpful, harmless, and honest, strategically deceiving their users in a realistic situation without direct instructions or training for deception.” [Introducing Adept Experiments](https://www.adept.ai/blog/experiments) Introducing Adept Experiments [A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) - YouTube](https://www.youtube.com/watch?v=ob4vuiqG2Go) neel nanda A Walkthrough of Reverse-Engineering Modular Addition: Model Training (Part 1/3) neel nanda A Walkthrough of A Mathematical Framework for Transformer Circuits [A Walkthrough of A Mathematical Framework for Transformer Circuits - YouTube](https://www.youtube.com/watch?v=KV5gbOmHbjU) neel nanda A Walkthrough of Interpretability in the Wild [A Walkthrough of Interpretability in the Wild Part 1/2: Overview (w/ authors Kevin, Arthur, Alex) - YouTube](https://www.youtube.com/watch?v=gzwj0jWbvbo) A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 1/3 [A Walkthrough of Finding Neurons In A Haystack w/ Wes Gurnee Part 1/3 - YouTube](https://www.youtube.com/watch?v=r1cfSpVAeqQ) transformer circuits [Transformer Circuits [rough early thoughts] - YouTube](https://www.youtube.com/playlist?list=PLoyGOS2WIonajhAVqKUgEMNmeq3nEeM51) [Concrete Steps to Get Started in Transformer Mechanistic Interpretability — Neel Nanda](https://www.neelnanda.io/mechanistic-interpretability/getting-started) [Interpretability quickstart resources & tutorials](https://alignmentjam.com/interpretability) [A Comprehensive Mechanistic Interpretability Explainer & Glossary - Dynalist](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J) [[1410.5401] Neural Turing Machines](https://arxiv.org/abs/1410.5401) Neural Turing Machines LLMs are like Swiss Army knife Sustainability, progress, wellbeing More safety focused AGI might ve even Faster since slow is smooth, smooth is faster Aš we understand and predict it with mechanistic interpretability more and therefore engineer AGI faster. Maybe short term delayed, but long term faster singularity! economic, technological, social, intelligence, brain hardware, brain software, equality for all beings [[2311.06158] Language Models can be Logical Solvers](https://arxiv.org/abs/2311.06158) Language Models can be Logical Solvers: LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers and bypasses the parsing errors by learning to strict adherence to solver syntax and grammar. AI schools of thought https://twitter.com/bindureddy/status/1726004228022235178?t=07K3N3jYbbXqKi1dnk1z3A&s=19 i would argue if there werent nerds or in general people without gigantic altruistic goals, those gigantic goals wouldn't exist and succeed at all, even if their total probability of realizing is much smaller, there is still a nonzero probability and it might change things much more in total tons of really impactful groups, companies, charities etc. live under extremely grandiose goals like OpenAI it can be ego for some, sure, but for lots its trying to realize an extreme global care i believe AGI benefiting all humanity, or this for example is feasible [How to Eradicate Global Extreme Poverty - YouTube](https://www.youtube.com/watch?v=2DUlYQTrsOs) [What are the most pressing world problems?](https://80000hours.org/problem-profiles/) AI risks, safeguaring pandemics, preventing nuclear war, climate change, coordination,... is all the big transhumanist/scientific dreams that people pursure (solving cancer, immortality, direct or indirect mind upgrade (wellbeing, productivity, memory, computational power, intelligence,...), aging, quantum gravity,... etc. dreaming big for the benefit of all from care is good IMO even if you have smaller probability of succeeding, ideally in group, but the probability of succeding is nonzero if i didnt dream big i wouldnt be helping to do neurophenomeneology/AI research for two groups i dislike the messages of people saying dont dream big, people were telling me that so much, and i f i listened i wouldnt be where i am right now and i wouldnt pursue what i wanna do soonish (learning ML engineering more, putting money from ML industry to effective charities, learning more mechanistic interpretability for steering LLMs, maybe helping more research there, maybe donating to it, maybe helping neurotech/science LLMs/etc,...), i wanna dream big, i care for big things, because i care for all people and want to maximize impact because of it so i am attempting to allocate my time towards that the most. you can also combine small and big altruistic goals dreaming big for the benefit of all from care is good IMO even if you have smaller probability of succeeding, ideally in group, but the probability of succeding is nonzero if i didnt dream big i wouldnt be helping to do neurophenomeneology/AI research for two groups i dislike the messages of people saying dont dream big, people were telling me that so much, and i f i listened i wouldnt be where i am right now and i wouldnt pursue what i wanna do soonish (learning ML engineering more, putting money from ML industry to effective charities, learning more mechanistic interpretability for steering LLMs, maybe helping more research there, maybe donating to it, maybe helping neurotech/science LLMs/etc,...), i wanna dream big, i care for big things, because i care for all people and want to maximize impact because of it so i am attempting to allocate my time towards that the most. you can also combine small and big altruistic goals yes, i am often arguing that some people have unrealistic goals, if you wanna do big things you have to set up a concrete realistic path (thats part of effective altruism how to do that) so im trying to be the closest to realistic optimist (not naive by unrealistic goals or not being aware of what actual big problems exist and predicting there can be impactful things actually done because status quo isnt fixed) possible systematically supporting the most effective causes, so currently im in my phase of learning ML more deeply by learning and building more from Stanford's Andrew Ng's ML materials and then Deepmind's Neel Nanda's interpretability materials and i wanna help stuff i listed in previous msg soonish with those skills (or i was at local Pirates political party meeting yesterday where i might integrate into that I wouldnt be doing if i didnt dream big) there are so many people dreaming giantly doing so much actual work and helping the world in practice thanks to it! altruistic ambition is the tool we have against sociopathic or other destructive forces that bring down the total amount of societal wellbeing, stability, flourishing and so on If people caring for everyone wont dream big then in total bigger % of people caring for just themselves will dream big and create a worse world for all and even if it fails one time, be realistically aware of probabilities of certain types of actions and try again, going small or big depending on what you want to tackle and where your capabilities, agency and care is, still attempting the most effectively if you want to, whatever happens never learning helplessness, collaborating with others Many people underestimate a lot what actual potential impact, possibilities, capabilities, agency etc. they have by potentially acting that helps everyone and fall into for example "the bad developments in our world are fundamentally unchangeable".... Untrue! There are many ways how many people can impact the system in a big way on economical, cultural (bottomup, shaping culture by discourse), environmental, scientific and technological (from education to research to building foundations to building products and integrating it to society), governance level (topdown, local or global), or by movements that for example advocate for freedom,... all of which is happening daily everywhere! We live in a world that we collectively make for ourselves which affects us all and many existing forces are fightable, modifiable, can be made stronger etc. by many influences! [Take action | Effective Altruism](https://www.effectivealtruism.org/get-involved) The future isn’t guaranteed. But it can be built. It must. is global governance inevitable in postAGI world? [Paul Christiano - Preventing an AI Takeover - YouTube](https://youtu.be/9AAhTLa0dT0?si=wHXdSg7RR2aw2Ll2) Paul Christiano alignment postAGI world Timestamps from my fav YT vids to different parts of the text Lots of people blogging about xrisk and AI alignment do alignment/interpretability research with it (or other things with other risks) and this is how they share their results, or they have an educator role, community bulding role, or connect it with what implications it has in other parts of our system that then prolifiates into culture which shapes our thinking or they have a political role such that we maximize benefits of AI instead of harms etc., applying it in engineering etc., theres's lots of other dynamics with it Or visions of grand unified theories motivate tons of scientists like Susskind, Carlo Rovelli, Carrol in physics, or Bilek in biophysics or Friston and Chris Fields in neurophysics, UTOK in psychology, etc. where their work has tons of practical results, or Qualia Research Institute and Active Inference Institute that I collaborate with There are attemts at generalizing and formalizing morality that some attempt to apply in AI alignment research because understanding morality is needed for that in certain agendas Or tons of existing neurotech exists because of people motivated by big transhumanist visions Even Shrondinger, Turing or Neumann wanted to unify physics and biology very hard and every technology we use and tons of research among many disciplines is based on their findings thanks to it Safety x Acceleration Motality x Money Agentif https://www.lesswrong.com/posts/8tuzCv9ujgoTPcgiA/when-will-ais-develop-long-term-planning "There is the rumor that Google's Gemini combines the Monte Carlo Tree Search method of policy optimization, used in AlphaGo, with a transformer-like architecture." Monte Carlo Tree Search AGI is here? thread https://twitter.com/IntuitMachine/status/1726201563889242488 https://twitter.com/IntuitMachine/status/1717240868497756173 [Human-like systematic generalization through a meta-learning neural network | Nature](https://www.nature.com/articles/s41586-023-06668-3) Human-like systematic generalization through a meta-learning neural network Nejsem osobně čistý liberterián, já chci minimální stát (nebo jakkoukoliv jinou governing entitu) na koordinaci/regulace proti globálním katastrofickým riskům (klima, nukleární válka, pandemie, biozbraně, AI zbraně jiný zbraně apod.) a zajištění healthcare, social supportu pro slabý, minimalizaci násilí, udržitelnost, regulace psychopatických korporací, podporu (vědecky oveřený) zdravýho vědeckýcho, technologickýho, environmentálního, ekonomickýho, sociálního vývoje kultury, či v budoucnu universal basic income nebo universal basic service pro všechny, a ideálně ať je tato vláda zvolena částečně demokraticky, ale zároveň asi ideálně taky měřit jejich kompetentnou a geniuitu (což by šlo dneska už různýma měřeníma), s různými checks and balances proti korupci, abychom jako společnost měli co největší svobodu (vědeckou vytvořený systém na řidičáky na drogy by byl nice), kolektivní wellbeing, vědecký a technologický progress a udržitelnost. bral bych aby ty všechny ručení byly nepovinný, já bych si za ně platil, ty bys nemusel. Healthcare situaci chci takovou jaká je v Evropě, než tu Americkou. Stát teď není perfektní, ale když korporace nebudeš nutit aby dodržovali human rights, tak je dodržovat často nebdou, což stát dělá. Chci aby státnem funded levný optional insurances byly místo předražených privátních na který musí člověk dát majland. Souhlasím že dosavadní stát je vysoce neefektivní co se týče tohoto a často ty prachy jdou úplně jinam než by většina taxpayerů chtěla, hlavně kvůli korupci a celkově řízení neempiricky, to taky nechci. Bral bych chci stát pro benefit lidí, ne pro benefit bohatých skrz power hungriness a nealtruistický lobbying. [[2310.11511] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511) https://twitter.com/IntuitMachine/status/1716179988624351358 [Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections - YouTube](https://www.youtube.com/watch?v=iGhoH1Q1eE8) SELF-RAG offers a solution - an AI system capable of reflective reasoning and self-critique [Open Problems in Mechanistic Interpretability: A Whirlwind Tour - YouTube](https://www.youtube.com/watch?v=ZSg4-H8L6Ec&pp=ygUvR3Jvd3RoIGFuZCBGb3JtIGluIGEgVG95IE1vZGVsIG9mIFN1cGVycG9zaXRpb24%3D) Open Problems in Mechanistic Interpretability: A Whirlwind Tour Instead of iterative optimization like gradient descent, could you somehow jump in the loss function potential space to the local minima solution more directly, with some analytical methods, or is too messy for such analysis? Hmmm. I feel like existing dynamical systems/analysis/stat math could have something like that in it For simple functions its easy, but for complex ones like space of gigantic amount of parameters I suppose we just have these iterative techniques without shortcuts. Hmm this looks good, but includes almost nothing on Transformers [[2310.20360] Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory](https://arxiv.org/abs/2310.20360) Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory Omnimodal Symbolic-Connectivist Hybrid AGSI Shapeshifting into Arbitrary Architectures will be the president in 2032 to comparisions add AGI as God or just another optimilization algorithm [[2212.07677] Transformers learn in-context by gradient descent](https://arxiv.org/abs/2212.07677) Transformers learn in-context by gradient descent https://twitter.com/QuanquanGu/status/1722364144379396513 LLM Rephrase and Respond We know basically nothing about the future. It's theoretically impossible. We can't predict extremely complex nonlinear dynamics of chaotic systems such as humanity. All the gazilions forces fighting for dominance can have gazilions of possible outcomes, their unknowable synergy will be the outcome. May AGI benefit all of humanity, including giving freedom on all levels to all beings. A friendly reminder that in 2005 Kurzweil predicted that computers would pass the Turing Test by 2029, and set the date for a technological singularity at 2045. That seems pretty realistic at the moment. "15% chance of Dyson Sphere capable AI by 2030, 40% chance by 2040" - Paul Christiano https://www.flourishingai.org/ [GitHub - jacobhilton/deep_learning_curriculum: Language model alignment-focused deep learning curriculum](https://github.com/jacobhilton/deep_learning_curriculum) Chaos is a ladder. every human kills bilions or trillions of bacteria and microorganisms daily thats where we should put EA funds to SotA commercial neurotech: [Cody Rall MD with Techforpsych - YouTube](https://www.youtube.com/@CodyRallMD/videos) PauseAi E/Acc landscape https://twitter.com/PauseAI/status/1688480492486541312 AI safety is AI interpretability, which is AI steerability, which helps to increase both security and capabilities of models and it helps us understand and therefore predict models, which makes them do what we want, from ethics to finetuning to particular usecase in business [Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings | Nature Machine Intelligence](https://www.nature.com/articles/s42256-023-00748-9) Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings AI Selfimprovement [Recursive Self-Improvement - AI Alignment Forum](https://www.alignmentforum.org/tag/recursive-self-improvement) [[1502.06512] From Seed AI to Technological Singularity via Recursively Self-Improving Software](https://arxiv.org/abs/1502.06512) [Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)) [[2310.11511] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511) [[2210.11610] Large Language Models Can Self-Improve](https://arxiv.org/abs/2210.11610) [[2310.00898] Enabling Language Models to Implicitly Learn Self-Improvement](https://arxiv.org/abs/2310.00898) [Paper tables with annotated results for Language Model Self-improvement by Reinforcement Learning Contemplation | Papers With Code](https://paperswithcode.com/paper/language-model-self-improvement-by/review/) LLMs on political spectrum https://aclanthology.org/2023.acl-long.656.pdf [Evidence of a predictive coding hierarchy in the human brain listening to speech | Nature Human Behaviour](https://www.nature.com/articles/s41562-022-01516-2) Evidence of a predictive coding hierarchy in the human brain listening to speech https://twitter.com/MattPirkowski/status/1726787146907037889 "The answer isn't degrowth, but growth that maximizes planetary complexity while minimizing the degree to which any entropy not re-usable somewhere else as a source of free energy remains within planetary bounds" Yes! We can expand to the universe, it can then become cosmic bounds! We can build Dyson spheres, safeguard against existencial and natural risks of sentience by mitigation and technology (no more natually induced existencial risks with powerful enough technology), build sentience aligned AGI helping us though this process, or eventually merging with it. This growth of life with meaning can go on infinitely, and we might even outsmart the second law of thermodynamics according to Dyson's eternal intelligence. I'm for transhumanist scientific and technological progress, sustainability and wellbeing (including sense of freedom) of all of sentience acceleration towards this path! AI risks: Reward hacking or deception [Paul Christiano - Preventing an AI Takeover - YouTube](https://youtu.be/9AAhTLa0dT0?si=_y4uokDOECO4Z891) kolem 1h Oh this starts good with interpretability of LLMs using physics ideas [Dario Amodei (Anthropic CEO) - $10 Billion Models, OpenAI, Scaling, & Alignment - YouTube](https://youtu.be/Nlkk3glap_U?si=t33eWwXzAY5fBQdn) [Scaling Laws from the Data Manifold Dimension](https://jmlr.org/papers/v23/20-1111.html) Scaling Laws from the Data Fractal Manifold Dimension Will change to loss function, not next token prediction, or reinforcement learning, needed for AGI? [Dario Amodei (Anthropic CEO) - $10 Billion Models, OpenAI, Scaling, & Alignment - YouTube](https://youtu.be/Nlkk3glap_U?si=aTVdCvVtoQUcFTF6) Ontology for prompting https://twitter.com/IntuitMachine/status/1727079666001870877?t=gvw8ehRHFqufGvDZwSEaAw&s=19 Post-Scarcity Society Compass All of them as options for every being https://twitter.com/WoodlouseM/status/1723686077742150143 [[2311.11829] System 2 Attention (is something you might need too)](https://arxiv.org/abs/2311.11829) System 2 Attention (is something you might need too) [The Exciting, Perilous Journey Toward AGI | Ilya Sutskever | TED - YouTube](https://www.youtube.com/watch?v=SEkGLj0bwAU) The Exciting, Perilous Journey Toward AGI | Ilya Sutskever | TED Aging bottlenecks [Bottlenecks of Aging — Amaranth Foundation](https://amaranth.foundation/bottlenecks-of-aging) Model of language brain https://twitter.com/ElliotMurphy91/status/1727035142626312400?t=7qdTcV