Book 42 - Burny

pomalu a jistě nacházíme algorithmy co se učí když zrovna jen nememorizuje, což je ale často neefektivní a negenralizuje, oproti overfittovani je bambilion metod jak tomu zamezit (ale lookup table je taky jen konkrétní algorithmus co se může naučit) ale musíme ten reverse engineering zrychlit Tons of AGI labs have big hope AGI is near Me too, plus I have big hope automated AGI mechanistic interpretability is near to actually properly steer AIs in the ways we want The more we hope the more we tend to put effort in it which increases the chances of it succeeding on average Maybe we can map out the whole phasespace of the algorithms neural nets learn, maybe characterizing phasespace and possibly finding out it's building blocks might give us the tools for the exponential interpretability automation https://fxtwitter.com/jacobandreas/status/1734239534978945371 [[2306.17844] The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks](https://arxiv.org/abs/2306.17844) The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks "Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space." [What a Contest of Consciousness Theories Really Proved | Quanta Magazine](https://www.quantamagazine.org/what-a-contest-of-consciousness-theories-really-proved-20230824/) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9916582/ What a Contest of Consciousness Theories Really Proved An adversarial collaboration protocol for testing contrasting predictions of global neuronal workspace and integrated information theory The first hurdle checked how well each theory decoded the categories of the objects that the subjects saw in the presented images. Both theories performed well here, but IIT was better at identifying the orientation of objects. The second hurdle tested the timing of the signals. IIT predicted sustained, synchronous firing in the hot zone for the duration of the conscious state. While the signal was sustained, it did not remain synchronous. GNWT predicted an “ignition” of the workspace followed by a second spike when the stimulus disappeared. Only the initial spike was detected. In the on-screen scoring for the NYU audience, IIT pulled ahead. The third hurdle concerned overall connectivity across the brain. GNWT scored better than IIT here, largely because some analyses of the results supported GNWT predictions while the signals across the hot zone were not synchronous. [Bloomberg - Are you a robot?](https://www.bloomberg.com/news/features/2023-12-19/longevity-startup-retro-biosciences-is-sam-altman-s-shot-at-life-extension) Sam Altman-backed longevity research facility opens its doors https://www.lesswrong.com/posts/rNFzvii8LtCL5joJo/dark-matters Quantum databases https://www.cs.cornell.edu/~sudip/QuantumDB.pdf "Quantum Computing without Quantum Computers: Database Search and Data Processing Using Classical Wave Superposition" [[2012.08401] Quantum Computing without Quantum Computers: Database Search and Data Processing Using Classical Wave Superposition](https://arxiv.org/abs/2012.08401) https://fxtwitter.com/emollick/status/1737573429153354212 https://fxtwitter.com/EricTopol/status/1737508532583604552 "Coscientist"—a GPT-4 based autonomous LLM system that demonstrates appreciable reasoning capabilities, ... solving of multiple problems and generation of code for experimental design" The authors got GPT-4 to autonomously research, plan, and conduct chemical experiments, including learning how to use lab equipment by reading documentation (most were operated by code, but one task had to be done by humans) Survey of mathematical LLMs [[2312.07622] Mathematical Language Models: A Survey](https://arxiv.org/abs/2312.07622) [[2312.10794] A mathematical perspective on Transformers](https://arxiv.org/abs/2312.10794) A mathematical perspective on Transformers Attachments are tools you can bend in any way you want https://twitter.com/burny_tech/status/1737833915933683824 [[2312.09979v2] LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment](https://arxiv.org/abs/2312.09979v2) LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment We are identifying generalizing circuits inside smaller models that can be directed! Saying they are just "autocomplete databases", that they don't generalize at all or they fully generalize is misleading! Not to mention there are various tricks to for LLMs to do these tasks, like multi-shots, chain of thought, tree of thoughts, selfcorrection, search, planning, generating milions of mutating results and choosing the best using some heuristic and other types and combinations (to je právě zatím expensive, ale menší modely by možná dali lepší performance), retrieval augmented generation (z libraries nebo snippetu kódu), multiagent systems (expert ecosystems ala AutoGen), finetuning experti (na různý jazyky, frameworky apod.,), přístu k internetu, přístup k OS přes python, konzoli, skripty, simple akce nebo multimodálně, apod. One strong evidence against the "database autocomplete" or "stochastic parrot hypothesis" is OtherlloGPT. OtherlloGPT learned emergent nonlinear internal representation of the Otherllo's board state! >Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms. >The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process In other small models we found various detectors of edges, color, fur, parts of a car, etc. that compose into circuits in image classifiers, simple world models (like representations of board state in board games), representation of space and time, finite state automata (for example for language models composing html), modular addition using trigonometric composition, group theoretic operations using representation theory etc. We still know almost nothing about the internal dynamics of gigantic models like GPT4! Any overly confident claims in understanding their mechanistic internal dynamics is not found in empirical evidence yet. Best mechanistic interpretability introductory lecture is by Neel Nanda that I really recommend who did mechinterp in Anthropic and now in Deepmind. Otherllo-GPT paper 1, Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task: [[2210.13382] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task](https://arxiv.org/abs/2210.13382) Otherllo-GPT paper 2, Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT: [[2310.07582] Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT](https://arxiv.org/abs/2310.07582) Neel Nanda lecture, Concrete open problems in mechanistic interpretability | Neel Nanda | EAG London 23: [Concrete open problems in mechanistic interpretability | Neel Nanda | EAG London 23 - YouTube](https://www.youtube.com/watch?v=7t9umZ1tFso) Neel Nanda on podcast, Mechanistic Interpretability - NEEL NANDA (DeepMind): [Mechanistic Interpretability - NEEL NANDA (DeepMind) - YouTube](https://www.youtube.com/watch?v=_Ygf0GnlwmY) [Synchronization of chaos - Wikipedia](https://en.wikipedia.org/wiki/Synchronization_of_chaos) Synchronization of chaos is a phenomenon that may occur when two or more dissipative chaotic systems are coupled. [The Surprising Secret of Synchronization - YouTube](https://www.youtube.com/watch?v=t-_VPRCtiUg) [Topological synchronization of chaotic systems | Scientific Reports](https://www.nature.com/articles/s41598-022-06262-z) [The hidden synchronicity in chaos: topological synchronization between chaotic systems - YouTube](https://www.youtube.com/watch?v=iE-f0S4Oug8) https://phys.org/news/2022-04-topological-synchronization-chaotic.html [Phys. Rev. E 98, 052204 (2018) - Synchronization of chaotic systems: A microscopic description](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.98.052204) [(PDF) The hidden synchronicity in chaos | Nir Lahav - Academia.edu](https://www.academia.edu/38114281/The_hidden_synchronicity_in_chaos%C3%AF%C2%BB%C2%BF) [Unlocking the Secrets of Self-Organization: From Ising Models to Transformers - YouTube](https://www.youtube.com/watch?v=cGcY-ReeGDU) Unlocking the Secrets of Self-Organization: From Ising Models to Transformers New ultra-high speed processor to advance AI, driverless vehicles and more that operates more than 10,000 times faster than typical electronic processors that operate in Gigabyte/s, at a record 17 Terabits/s. The system processes 400,000 video signals concurrently, performing 34 functions simultaneously that are key to object edge detection, edge enhancement and motion blur. Photonic signal processor based on a Kerr microcomb for real-time video image processing https://techxplore.com/news/2023-12-ultra-high-processor-advance-ai-driverless.html [Photonic signal processor based on a Kerr microcomb for real-time video image processing | Communications Engineering](https://www.nature.com/articles/s44172-023-00135-7) ML prediction of chaos https://twitter.com/wgilpin0/status/1737934963365056543 [Quanta Magazine - YouTube](https://www.youtube.com/@QuantaScienceChannel/videos) summaries of breakthroughs [Meet 'Coscientist,' your AI lab partner | NSF - National Science Foundation](https://new.nsf.gov/science-matters/meet-coscientist-your-ai-lab-partner) animation vs physics [Animation vs. Physics - YouTube](https://www.youtube.com/watch?v=ErMSHiQRnc8) [[2312.06937] Can a Transformer Represent a Kalman Filter?](https://arxiv.org/abs/2312.06937) can transformer implement kalman filter simulated annealing animated [Simulated Annealing Explained By Solving Sudoku - Artificial Intelligence - YouTube](https://www.youtube.com/watch?v=FyyVbuLZav8) Mathematics of physics and AI is all you need Cost function of life is genetic fitness. Do themselves humans have one concrete cost function, or it's a competition of multiple interacting cost functions in a complex nonlinear chaotic dynamical system, or is there any cost function at all in the first place? Autism on acid, empathy acceleration >LSD helped him to feel empathy. >It reduced the activation energy needed to overcome inhibitions, giving him a wider range of options in social situations. >LSD is like an extrovert switch. Same for me, and combined with MDMA, nondual deconstructive meditation, and love and kindness constructive meditation https://twitter.com/carobcircuit/status/1737862030143455644 Psychedelics reopen the social reward learning critical period [Psychedelics reopen the social reward learning critical period | Nature](https://www.nature.com/articles/s41586-023-06204-3) u-net [The U-Net (actually) explained in 10 minutes - YouTube](https://www.youtube.com/watch?v=NhdzGfB1q74) Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models https://twitter.com/_akhaliq/status/1737906823082967403 slowly replacing neurons with artificial ones until the entire brain is synthetic which might allow continuity of your consciousness. diffusion model math [What are Diffusion Models? - YouTube](https://www.youtube.com/watch?v=fbLgFrlTnGU&t=262s&pp=ygUaZGlmZnVzaW9uIG1hY2hpbmUgbGVhcm5pbmc%3D) [Diffusion Models | Paper Explanation | Math Explained - YouTube](https://www.youtube.com/watch?v=HoKDTa5jHvg&t=478s&pp=ygUaZGlmZnVzaW9uIG1hY2hpbmUgbGVhcm5pbmc%3D) 10 weird algorithms [10 weird algorithms - YouTube](https://www.youtube.com/watch?v=SmyPTnlqhlk) What if merging of AdamW and Simulated Annealing is all we need for AGI ( [Hamiltonian Monte Carlo - Wikipedia](https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo) [Stochastic gradient Langevin dynamics - Wikipedia](https://en.m.wikipedia.org/wiki/Stochastic_gradient_Langevin_dynamics) ? ) AdamW is a variant of gradient descent with something like movement and monitoring of reliability in its maths, used in LLMs The fact that Quantum Fourier Transform and Quantum Field Theory are both QTF just made me slightly confused [A 30-Year-Old Cryptographic Challenge Is About To Be Solved | Discover Magazine](https://www.discovermagazine.com/technology/a-30-year-old-cryptographic-challenge-is-about-to-be-solved) quantum approximate optimization algorithm (shor's alternative for factoring on quantum computers that doesnt scale though) https://twitter.com/AndrewLampinen/status/1738014840080506978 Research in mechanistic interpretability and neuroscience often relies on interpreting internal representations to understand systems, or manipulating representations to improve models. I gave a talk at @unireps at NeurIPS on a few challenges for this area https://twitter.com/clusteredbytes/status/1737605003320111342 FLARE - Dynamical Forward Looking Active RAG [Paper page - AppAgent: Multimodal Agents as Smartphone Users](https://huggingface.co/papers/2312.13771) AppAgent: Multimodal Agents as Smartphone Users https://twitter.com/MITCoCoSci/status/1737948183689629721?t=lavy4njOfN-a26M4JEqsmA&s=19 Ada, integrates LLMs + formal planning to learn libraries of composable skills adapted to individual planning domains: Are you an entropist or negentropist? Veritasium entropy [The Most Misunderstood Concept in Physics - YouTube](https://youtu.be/DxL2HoqLbyA?si=fi1h_jCE5OSi6Lnj) How would the universe look like if energy didn't have the tendency to spread out, if entropy wasn't on average increasing, if second law of thermodynamics didn't hold? Sun gives us little bursts of negentropy that we convert to entropy, earth still get as much energy as we radiate back to the universe, but the sin's energy is more useful and is powering life. category theory: lets abstract everything into oblivion, nothing is true anymore and everything is true at the same time https://twitter.com/burny_tech/status/1738226703435178268?t=saGYXd_l5XRl7qrpkCEmgg&s=19 Ontological algebra reducibility Animated math lecture tag https://fxtwitter.com/clusteredbytes/status/1737605003320111342 Dynamical Forward Looking Active RAG https://www.andriushchenko.me/gpt4adv.pdf Adversarial Attacks on GPT-4 via Simple Random Search Optimism, obsession, self-belief, raw horsepower and personal connections are how things get started. [What I Wish Someone Had Told Me - Sam Altman](https://blog.samaltman.com/what-i-wish-someone-had-told-me) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9300149/ Insular Stimulation Produces Mental Clarity and Bliss [Bloomberg - Are you a robot?](https://www.bloomberg.com/news/features/2023-12-19/longevity-startup-retro-biosciences-is-sam-altman-s-shot-at-life-extension) The Most Secretive Longevity Lab Finally Opens Its Doors [Midjourney v6, Altman 'Age Reversal' and Gemini 2 - Christmas Edition - YouTube](https://www.youtube.com/watch?v=ZewqcbEXWqs) Make sure to take your daily doses of hopium The quantum fields that are the basis of our universe in quantum field theory are actually fields of hopium and love Survey on making LLMs faster [These AI Glasses are Crazy! - YouTube](https://www.youtube.com/shorts/XKGJTMJVRBs) [Arousal as a universal embedding for spatiotemporal brain dynamics - YouTube](https://www.youtube.com/watch?v=IoCK8d75R9g) Arousal as a universal embedding for spatiotemporal brain dynamics I will huff all the hopium there is and noone can stop me fast quantized mistral llm https://twitter.com/reach_vb/status/1738281313377960398 Summary of current state of RAG landscape [[2312.10997v1] Retrieval-Augmented Generation for Large Language Models: A Survey](https://arxiv.org/abs/2312.10997v1) Summary of 2023 math [Biggest Breakthroughs in Math: 2023 - YouTube](https://www.youtube.com/watch?v=4HHUGnHcDQw) Epistemology is the study of cost functions of humanity's classifiers Různě se do všeho cpou tyhle metody, na spoustě místech se to ještě nedalo a čeká se než někdo ty pucle složí: multi-shots, chain of thought, tree of thoughts, selfcorrection, search, planning, generating milions of mutating results and choosing the best using some heuristic and other types and combinations (to je právě zatím expensive, ale menší modely by možná dali lepší performance), retrieval augmented generation (z libraries nebo snippetu kódu), multiagent systems (expert ecosystems ala AutoGen), finetuning experti (na různý jazyky, frameworky apod.,), přístup k internetu, Pythonu, přístup k OS přes konzoli, skripty, simple akce nebo multimodálně, apod., a napojit to na robotiku co už se dokáže solidně pohybovat, recognitovat, plánovat s LLMs, manipulovat s objekty apod. Efficient RAG https://twitter.com/bindureddy/status/1738367792729264207?t=SncAFD5Gh4dUcqJTzYcufg&s=19 Depending on what neural configuration or part of my brain (that sucks and organizes correlations from the sensory data) I inhabit currently, my p(doom), p(utopia) and everything else on this spectrum oscillates from 0% to 100% while definitions of p(doom), AGI and everything shapeshift from one extreme to the other to normalcy, omniperspectivity! How the singularity will turn out in reality, nobody really knows. Let's not just hope, but also do our best for the most beneficial outcome for all of sentience. https://twitter.com/burny_tech/status/1738596969038143951 AGI mechanistic interpretability accelerationism LLM could store extra knowledge and memory in an external symbolic world model Encoding semantic and lexical meaning hybrid search RAG https://twitter.com/jerryjliu0/status/1738583302481842339?t=etS6zqQ8dKOvMLMZSCLzvQ&s=19 Suppose we gradually replaced the neurons in your brain with artificial neurons. Is there a point at which your consciousness would wink out? [NeurIPS 2023 Recap — Best Papers - Latent Space](https://www.latent.space/p/neurips-2023-papers)