Book 160 - Burny

"bayesian methods Bayesian methods are a powerful approach to statistical inference that allow combining prior knowledge or beliefs with observed data to update probabilities and make decisions under uncertainty. Some key points about Bayesian methods: - Bayesian inference is based on Bayes' theorem, which provides a way to calculate the conditional probability of a hypothesis given the observed data. It combines the prior probability of the hypothesis before seeing the data with the likelihood of the data under that hypothesis[1]. - A key concept is the use of probability distributions to represent the uncertainty in model parameters. In Bayesian analysis, parameters are treated as random variables with prior distributions that are updated to posterior distributions after observing data[2][3]. - Bayesian methods allow incorporating prior information, which can come from expert opinion, previous studies, or other sources. The prior can be informative or non-informative. The posterior distribution combines the prior and the data[1]. - Compared to frequentist methods, Bayesian approaches can handle small sample sizes better, express uncertainty more directly, and provide a natural way to update knowledge as new data arrives[3][4]. - Bayesian networks and dynamic Bayesian networks are graphical models that leverage Bayesian principles for reasoning under uncertainty and learning from data[4][5]. - While theoretically attractive, Bayesian methods can be computationally intensive. Markov chain Monte Carlo (MCMC) sampling techniques are commonly used for approximation[3]. In summary, Bayesian methods provide a principled framework for probabilistic inference that is increasingly used in statistics, machine learning, and artificial intelligence to model and reason about complex, uncertain systems[2][4][5]. Citations: [1] Bayesian inference - Wikipedia [Bayesian inference - Wikipedia](https://en.wikipedia.org/wiki/Bayesian_inference) [2] Bayesian analysis | Probability Theory, Statistical Inference - Britannica [Bayesian analysis | Probability Theory, Statistical Inference | Britannica](https://www.britannica.com/science/Bayesian-analysis) [3] A Gentle Introduction to Bayesian Analysis - NCBI https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158865/ [4] [PDF] Bayesian Methods in Artificial Intelligence https://physics.mff.cuni.cz/wds/proc/pdf10/WDS10_104_i1_Kukacka.pdf [5] Chapter 1 The Basics of Bayesian Statistics [Chapter 1 The Basics of Bayesian Statistics | An Introduction to Bayesian Thinking](https://statswithr.github.io/book/the-basics-of-bayesian-statistics.html) " Llama 3: pretraining on more than 15 trillion quality tokens and finetuning on quality instruction datasets 10 mil quality human-annotated examples is all you need [x.com](https://twitter.com/yacineMTB/status/1781299113243369691) [x.com](https://twitter.com/Teknium1/status/1781345814633390579) [AI More Energy Efficient than Humans, New Study Finds - YouTube](https://www.youtube.com/watch?v=5ayaXOVb9IU) Stuart Russel [Professor Stuart Russell on AI, AGI, Education, Consciousness & The Future of Humanity - YouTube](https://www.youtube.com/watch?v=ZQuXtONnnKU&pp=ygUOc3R1YXJ0IFJ1c3NlbGw%3D) How did consciousness evolve? - with Nicholas Humphrey [How did consciousness evolve? - with Nicholas Humphrey - YouTube](https://www.youtube.com/watch?v=9QWaZp_2I1k) Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network [Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024) - YouTube](https://www.youtube.com/watch?v=fk2r8y5TfNY) [Polymathic](https://polymathic-ai.org/) "Machine learning methods such as neural networks are quickly finding uses in everything from text generation to construction cranes. Excitingly, those same tools also promise a new paradigm for scientific discovery. In this Presidential Lecture, Miles Cranmer will outline an innovative approach that leverages neural networks in the scientific process. Rather than directly modeling data, the approach interprets neural networks trained using the data. Through training, the neural networks can capture the physics underlying the system being studied. By extracting what the neural networks have learned, scientists can improve their theories. He will also discuss the Polymathic AI initiative, a collaboration between researchers at the Flatiron Institute and scientists around the world. Polymathic AI is designed to spur scientific discovery using similar technology to that powering ChatGPT. Using Polymathic AI, scientists will be able to model a broad range of physical systems across different scales." Qualia Takeoff: The process of human beings increasing the total number of qualia or conscious experiences they can experience. Qualia Singularity: A hypothetical future point in time after the discovery of a theory of consciousness, where the number of qualia or conscious experiences increases exponentially, resulting in unforeseeable consequences for human civilization. [Qualia Takeoff in The Age of Spiritual Machines — Prophetic](https://www.blog.propheticai.co/blog/cac5ipk3midzzjy8vnyp8zh2umyzup) There is a sweet spot between delving too much into your (our) past versus building your (our) future without considering the past. Both must be done. If only we had a better mathematical model on why deep learning works in the first place, nobody knows! It's weird that the fitting algorithm often doesnt get stuck in local minima or saddle points and that it can efficiently recruit spare model capacity to fit unexplained training data wherever they lie etc.. I mean, universal approximation theorem is good, statistical learning theory is nice, singular learning theory, principles of deep learning (effective theory of DL), geometric deep learning, categorical deep learning, spline theory of deep learning, shard theory,... But there are still so many unknowns. And that they weakly generalize is also fascinating and very unexplained and people mostly thought that wont be the case, but we might be able to do stronger generalization with neurosymbolic methods, such as adding bayesian program synthesis, if someone tries to scale these methods that seem to work well on small scale! Lack of concensus seems to imply fluidity a lot, which is nice. I'm trying to find and collect as many of the definitions of intelligence that different people in research use, that you can actually work with in practice and ideally measure them, as possible. The most mainstream way of measuring intelligence is scoring on popular benchmarks, which is great for certain usecases, but lots of these popular benchmarks are flawed in many ways, but new better ones are coming out all the time. Or this one is very oldschool https://arxiv.org/abs/0712.3329 Informally: Intelligence measures an agent’s ability to achieve goals in a wide range of environments. Or if one defines intelligence as generalization capability, then you have to define generalization power, as for example the ability to mine previous experience, to make sense of future novel situations. To what extend can we analogize the knowledge that we already have into simulacrums that apply widely across the experience space. And put that into math https://arxiv.org/abs/1911.01547 using algorithmic Information theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience Chollet made a benchmark to measure intelligence as generalization https://arxiv.org/abs/2403.01267 [Moving the Eiffel Tower to ROME: Tracing and Editing Facts in GPT | OpenReview](https://openreview.net/forum?id=mMECu_poAs) "*why* it formed certain relationships" thats what explainable ai and mechanistic interpretability fields, and other mathematical methods from theoretical deep learning, try to figure out in deep neural nets and other ML architectures. i like explainability from the start too, but it seems like in some problem domains doing black box magic and reverseengineering the result gives nice results. im for trying as many methods as possible, both classical statistics, and symbolic/neurosymbolic/neural methods, as the results often break our intuition. or example AlphaFold has deep learning architecture and gave us many nice results in protein folding. i agree that deep learning is one of many ways to approximate arbitrary functions or do information processing in general, but i think its good to acknowledge that it has its own results in various domains in science that are useful (i wish more people tried neurosymbolic methods!). its interesting that similar methods in mechanistic interpretability to reverse engineer artificial neural networks are used in neuroscience to understand biological neural networks :D [Representation Engineering: A Top-Down Approach to AI Transparency](https://www.ai-transparency.org/) AlphaLLM https://arxiv.org/abs/2404.12253 I'm ready to get a macrodose of all of knowledge that is known and still not known [[Summary] Progress Update #1 from the GDM Mech Interp Team — AI Alignment Forum](https://www.alignmentforum.org/posts/HpAr8k74mW4ivCvCu/progress-update-from-the-gdm-mech-interp-team-summary) [x.com](https://twitter.com/QuintinPope5/status/1780907564601106546) "So where did alpa zero get the awesome training data?" "Self play using the rules of the game to reliably identify which trajectories to imitate and which to avoid. The issue is that nontrivial domains don't have access to such a convenient source of ground truth feedback about which actions are better or worse. E.g., if scientists could propose a theory and get instant cheap feedback about how correct that theory is, then science would be much, much easier. Same with other nontrivial domains such as engineering, military strategy, persuasion, etc. Instead, we need to rely on much slower, more expensive and less reliable methods such as real world experiments, simulations, etc., which are getting *more* expensive as the capabilities they're supposed to be judging move further beyond the human frontier. In contrast, toy settings like Go have cheap, easy ground truth feedback, even for trajectories that demonstrate superhuman Go capabilities. This makes it vastly easier to superhuman capabilities in those domains, and also makes accomplishments in those domains act as misleading points of comparison for forecasting capabilities progress in domains with less readily available ground truth feedback. Does that address your question about how AlphaZero Go got the training data required to reach superhuman Go capabilities, and why doing the same in more important domains will be so much harder?" [#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism - YouTube](https://www.youtube.com/watch?v=f9Msoqvlla4) My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky The Shard Theory Sequence: https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX Quintin’s Alignment Papers Roundup: https://www.lesswrong.com/s/5omSW4wNKbEvYsyje Evolution provides no evidence for the sharp left turn: https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn Deep Differentiable Logic Gate Networks: https://arxiv.org/abs/2210.08277 The Hydra Effect: Emergent Self-repair in Language Model Computations: https://arxiv.org/abs/2307.15771 Deep learning generalizes because the parameter-function map is biased towards simple functions: https://arxiv.org/abs/1805.08522 Bridging RL Theory and Practice with the Effective Horizon: https://arxiv.org/abs/2304.09853 here's one of the papers trying to explain why neural networks don't severely overfit as predicted by classical learning theory, it says that the parameter-function map of many deep neural nets should be exponentially biased towards simple functions by applying a very general probability-complexity bound recently derived from algorithmic information theory https://arxiv.org/abs/1805.08522 "We have these potential current and future AI scenarios: - centralization in the hands of corporations maximizing profit - centralization in the hands of the government - decentralization in the hands of the people (including bad actors) I sense the last one has the smallest amount of disadvantages and biggest amount of advantages. Centralization in the hands of benevolent nerds doesnt seem realistic, plus i want all sorts of good actors to have access to this powerful dual use technology, for science or defense for example, and democratically represent all sorts of values. I feel like this has assumption of having bigger trust in corporations and governments not blowing everyone than in people not blowing everyone. Assuming the tech will get this far, after all the other risks from centralization of power, this world ending risk feels basically equal in both hands. Trying to imagine no dystopian outcome in the case of centralization and can't. I think there's almost zero probability that this small group of people won't get corrupted by incentives and we don't end up in dystopia. I also think that the probability of the AI tech getting on such xrisk level from a single person/group of people is almost zero, that's why I weigh it so little. I think it's currently and it will be extremely powerful, but not on the level of killing everyone Eliezer style from small group of people. (Almost zero probability IMO) But I think I get your assumption, I used to hold it a lot. Now I think I would need to see very concrete probable scenarios with implementation details on how could you get from a single person basement/group of people having intelligence tech to ending the world just by it. Listening/reading to many hours of Eliezer, Connor, Bostrom etc. didn't seem to work for me. Instrumental convergence feels like it's not happening in practice. FOOM doesn't seem realistic. Bioweapons, not sure how much could realistically AI accelerate that risk? Terminator with values aligned against ours? Destroying fundamental infrastructure? I would need to see comparisions on these scenarios to existing ways to do it without AI and why AI extremely accelerates all these risks or rogue AI risk on its own etc. and why the benefits such as defence against other actors or science etc. aren't worth it relative to the risks. Or plausible step by step route of AGI god paperclipping etc. the earth or the universe with realistically taking in account our system's constrains and physics constrains etc. Do you have an opinion on these people's arguments that deconstruct Bostrom's arguments? [AI Optimism – For a Free and Fair Future](https://optimists.ai/) Maybe when we start getting more autonomous OOD agents in practice then some of the inherent AI risks will start making more sense to me as current AI systems have very weak generalization capabilities and are pretty easily shaped by training data even tho in quite alchemist ways [x.com](https://twitter.com/__HMYS__/status/1781408253756236249) " [G√∂del, Escher, Bach author Doug Hofstadter on the state of AI today - YouTube](https://www.youtube.com/watch?v=lfXxzAVtdpU) https://www.lesswrong.com/posts/kAmgdEjq2eYQkB5PP/douglas-hofstadter-changes-his-mind-on-deep-learning-and-ai [x.com](https://twitter.com/liron/status/1675724309745246208) https://www.sciencedirect.com/science/article/pii/S2665945X24000068?via%3Dihub https://arxiv.org/abs/2210.05492 given a reasonable amount of data and a "simple" self-play objective yields superhuman diplomacy play Healthy competition with collaboration to maximize probability of mutual growth of humanity into intergalactic species [AI Panel Discussion W/ Emad Mostaque, Ray Kurzweil, Mo Gawdat & Tristan Harris | EP #96 - YouTube](https://www.youtube.com/watch?v=yVv3mg8zWIU) [AI Panel Discussion Pt. 2 | EP #97 - YouTube](https://www.youtube.com/watch?v=HIidMryjSJQ) [Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT - 80,000 Hours](https://80000hours.org/podcast/episodes/zvi-mowshowitz-sleeper-agents-ai-updates/#:~:text=We%20have%20essentially%20the%20program%20being%20willing,us.%20%E2%80%A6%20And%20that%20is%20scary%20as%20hell) [Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI) - YouTube](https://youtu.be/9EN_HoEk3KY?si=6mcpOnsp5QTKTzIG) Noam Brown [Noam Brown, FAIR: On achieving human-level performance in poker and Diplomacy, and the power of spending compute at inference time - imbue](https://imbue.com/podcast/2023-02-09-podcast-episode-27-noam-brown/) "In this age of digital psyop WMDs, freedom of speech is sadly becoming freedom of brainwash. How to preserve one, without further enabling the other?" [x.com](https://twitter.com/Liv_Boeree/status/1781485570835058853?t=kmFOLCTWYsAy8J6r7B8C6Q&s=19) https://arxiv.org/abs/1806.07366 [Insects and Other Animals Have Consciousness, Experts Declare | Quanta Magazine](https://www.quantamagazine.org/insects-and-other-animals-have-consciousness-experts-declare-20240419/) Map of milky way [x.com](https://twitter.com/burny_tech/status/1781675087143301617?t=sh2IVKA-Klb0AUv7NDQLqQ&s=19) Anatomy of humans Periodic table of elements Which mathematical theories of ML would you add to this list? statistical learning theory, singular learning theory, spline theory of deep learning, principles of deep learning (effective theory of DL), geometric deep learning, categorical deep learning, shard theory, Brunton ml symmetries Infinite limits (infinite width NTK stuff, infinite depth ODE stuff, and infinite width-depth SDE stuff), mathematics of adversarial attacks, learning theory for NNs (why SGD works and finds generalising solutions), physics-informed ML, Hansen and Colbrook's work on the limits of deep learning, Neural Operators like the FNO. Then there is plenty of stuff on ML for mathematics, e.g. solving inverse problems and PDEs. https://www.sciencedirect.com/science/article/pii/S0167278924001106 Designing stable neural networks using convex analysis and ODEs Robot rights [x.com](https://twitter.com/anthrupad/status/1781579406420721955?t=9dtRv3djmKIqZPujkwVVaw&s=19) https://www.noemamag.com/ai-could-be-a-bridge-toward-diverse-intelligence/ [x.com](https://twitter.com/jam3scampbell/status/1781518399174062214?t=tqjQot79FTZ1MA14z3Oafg&s=19) Decentralize! "in general, i am extremely pro-capitalism, but it's hard to see how it doesn't lead to winner-take-all dynamics in a post-human, superintelligent future whoever has the most capital will have the most compute and hence the biggest brain right now, capital is already long-tail distributed, but what happens if this results in intelligence becoming long-tailed as well (rn it's normally distributed) if the richest person in the world is running their mind on 10e15 GPU's and the average person only has 100, then richest-person-in-the-world will be a god among bugs. this is not fun if you're a bug (gpu-poor)" hyperactive feral attention ungovernability neurodivergence "I really really really dislike how currently the word "AI" is for most people associated with people not really in the AI field using ChatGPT clumsily or other generative AI products or just neural networks existing and that being mostly what is there is to it. That's like 0.0000001% of what there is to the AI field and not fully representing it! The AI field in its broadest definition in cognitive science is such a gigantic interdisciplinary field with such a long history in theory and practice, from statistical methods to expert systems to symbolics to neural nets on many diverse modalities and problem domains (language, vision, science, reasoning,...) to neurosymbolics to cognitive architectures to neuromorohics to robotics to neuroscience x AI (NeuroAI) and so on with all sorts of maths from dynamical systems, complex systems, systems theory, cybernetics, optimization theory, algebra, geometry, analysis, probability, statistics, bayesian methods, logic, differential equations, group theory, category theory, physics, algebraic geometry, topology, information theory, algorithmic information theory, graph theory and so on. For me trying to understand intelligence in general is so extremely central to everything, as thanks to it we do everything we can do that requires it, and my perspective the most fascinating thing one can understand and engineer is trying to replicate fascinating nature's systems' intelligent behavior. It's such a lovely mystery that we haven't fully cracked yet!" https://arxiv.org/abs/2404.11794?fbclid=IwZXh0bgNhZW0CMTEAAR3tHLXnVqq6ySpLvHrmKqoL5pxXQKuZ282Ml5RZhUcjNHYtbDEuFQGPEzQ_aem_AR5j0vWukrY_Il2GTfxv8eynoxG5AoAaRGN68jDFHYUCzzaTLaZRl6SYKKft_UsQXR1cc-Cdg46PcLnmhO3emCyZ