Book 162 - Burny

https://arxiv.org/abs/2312.10794 https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf https://arxiv.org/abs/1804.04272 https://arxiv.org/abs/2302.04107 https://arxiv.org/abs/2103.09177 https://www.lesswrong.com/posts/SXJGSPeQWbACveJhs/the-best-tacit-knowledge-videos-on-every-subject [Notes on e/acc principles and tenets](https://beff.substack.com/p/notes-on-eacc-principles-and-tenets) https://arxiv.org/abs/2404.09516 https://arxiv.org/abs/2404.10981 https://arxiv.org/abs/2404.11584 [What should the Effective Altruism movement learn from the SBF / FTX scandal? (with Will MacAskill) | Clearer Thinking with Spencer Greenberg — the podcast about ideas that matter](https://podcast.clearerthinking.org/episode/206/will-macaskill-what-should-the-effective-altruism-movement-learn-from-the-sbf-ftx-scandal?fbclid=IwZXh0bgNhZW0CMTEAAR1Ri-7GffBdvyE60DOa2Zxmbaj2iUmRHh7Y3-VGddqX9bbjtSqyTHIM-Fs_aem_AaoksINWxRBheKhpgcEZAelfRJbPccCbO1riCai22LdXULYueGfIv1QcnsVWrdKpfRc2XpLQv74NHcjgQE0VNZva) There was vibe shift in a lot of EA circles from I want AGI benefiting all of humanity to stop all AGI no matter what [NeuroAI and Geometric Data Analysis](https://www.simonsfoundation.org/flatiron/center-for-computational-neuroscience/neuroai-and-geometry/) Solipsism is just disguised feeling of eternal cosmic loneliness Our culture should change back to mostly technooptimism from the recent mostly technodoomerism Peter Diamandis says AGI will lead to an age of abundance, with food, energy and healthcare democratized and demonetized [x.com](https://twitter.com/tsarnick/status/1781847492038181038?t=H4aTf1PCST5vdCvqEvt11g&s=19) I dislike how so much of current youth have this mindset of learned helplessness and degrowth of everything as the only solution, that feels like collective depression. Technooptimist growth as a solution to many "unsolvable" problems is possible! Let's break that collective depression and grow as life and become intergalactic civilization! https://www.lesswrong.com/posts/zaaGsFBeDTpCsYHef/shallow-review-of-live-agendas-in-alignment-and-safety mathematics of cognition https://www.lesswrong.com/posts/RQpNHSiWaXTvDxt6R/coherent-decisions-imply-consistent-utilities https://arxiv.org/abs/2006.15136 [Andy Matuschak - Self-Teaching, Spaced Repetition, Why Books Don’t Work - YouTube](https://www.youtube.com/watch?v=dmeRQN9z504) [Energies | Free Full-Text | A Review of Physics-Informed Machine Learning in Fluid Mechanics](https://www.mdpi.com/1996-1073/16/5/2343) [#35: Bridging Modalities: Vision, Language, and AI with Letitia Parcalabescu - The Transformative Ideas Podcast | Podcast on Spotify](https://open.spotify.com/episode/2sUYtwQKvLleRshLio4t1g?si=2b8af33b7d3b4bd9&nd=1&dlsi=69f414b15fb946b4) sheaves, stacks, and higher stacks [Olivia Caramello - 1/4 Introduction to sheaves, stacks and relative toposes - YouTube](https://www.youtube.com/watch?v=WeKMpXVgSLg) [Cyber Animism by Joscha Bach - YouTube](https://www.youtube.com/watch?v=YZl4zom3q2g) [Welcome to The Quantum Well! - The Quantum Well - Obsidian Publish](https://publish.obsidian.md/myquantumwell/Welcome+to+The+Quantum+Well!) [New Physics Theory Describes The Universe (Featuring Sara Walker) - YouTube](https://www.youtube.com/watch?v=mXmH4e--bY8) Cyber Animism by Joscha Bach [Cyber Animism by Joscha Bach - YouTube](https://www.youtube.com/watch?v=YZl4zom3q2g) https://arxiv.org/abs/2403.10895 [Physicists Think The Infinite Size of The Multiverse Could Be Infinitely Bigger : ScienceAlert](https://www.sciencealert.com/physicists-think-the-infinite-size-of-the-multiverse-could-be-infinitely-bigger) [Nick Bostrom on Superintelligence and the Future of AI | Closer To Truth Chats - YouTube](https://youtu.be/oWtqn4e4Zk4?si=16fhs4-XK6JruIYr) AI has many advantages that humans don't which makes them more easily become superhuman [x.com](https://twitter.com/SpencrGreenberg/status/1781702814500016486?t=I6JyfTRzrJBAAc-MIbCzJQ&s=19) "I" aka universal consciousness, expressing to first approximation as the unified quantum field theory standard model langragian with quantum gravity and emergent spacetime made of loops, encoded as hypercomputational quantum information, live in 42 googolsth multiverse branch [x.com](https://twitter.com/burny_tech/status/1781891648059855359?t=wQM7YnztLN_kUnf-kXdhFg&s=19) "what exactly it is which creates distictions between individuals? and how you explain duality of consciousness and quantum field? and what causes this expression?" "QRI's topological segmentation of the electromagnetic field solution is one possible solution [Frontiers | Don’t forget the boundary problem! How EM field topology can address the overlooked cousin to the binding problem for consciousness](https://www.frontiersin.org/articles/10.3389/fnhum.2023.1233119/full) with monistic panpsychist physicalism merging with idealism without dualism, with fields in quantum field theory are mathematically the fields of qualia, or whatever the current fundamental theory of universe is, I don't think that quantum field theory standard model Langragian with some solution to quantum gravity and other problems in physics is the final physics theory of the universe but I believe you could do similar analysis on overall information geometries in overall currently accepted physics across scales, not just on the electromagnetic field, or on some deeper mathematical physical structure from which quantum field theory standard model Langragian, with some solution to quantum gravity and other unsolved problems in physics, emerges" "Transformers attention The Transformer architecture, introduced in the 2017 paper "Attention is All You Need", revolutionized the use of attention mechanisms in deep learning models[1][2]. The key innovations of the Transformer's attention mechanism are: 1. Self-Attention: The Transformer relies entirely on self-attention to compute representations of its input and output, without using recurrent or convolutional layers[1][2]. Self-attention relates different positions of a single sequence in order to compute a representation of the sequence[1][4]. 2. Scaled Dot-Product Attention: The Transformer uses scaled dot-product attention, where the dot products are scaled down by the square root of the dimension of the keys to prevent the dot products from growing too large in magnitude[1][2]. 3. Multi-Head Attention: Instead of performing a single attention function, the Transformer uses multi-head attention which linearly projects the queries, keys and values h times with different learned projections. This allows the model to jointly attend to information from different representation subspaces at different positions[1][2]. The Transformer's attention mechanism has been widely adopted and adapted beyond machine translation to modalities like computer vision, speech recognition, and multi-modal processing[3][4]. It has enabled the development of powerful pre-trained language models like BERT and GPT[4]. Citations: [1] [PDF] Attention is All you Need - NeurIPS Proceedings https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf [2] The Transformer Attention Mechanism - MachineLearningMastery.com https://machinelearningmastery.com/the-transformer-attention-mechanism/ [3] 11. Attention Mechanisms and Transformers - Dive into Deep Learning [11. Attention Mechanisms and Transformers — Dive into Deep Learning 1.0.3 documentation](https://d2l.ai/chapter_attention-mechanisms-and-transformers/index.html) [4] Transformer (deep learning architecture) - Wikipedia [Transformer (deep learning architecture) - Wikipedia](https://en.wikipedia.org/wiki/Transformer_%28deep_learning_architecture%29) [5] All you need to know about 'Attention' and 'Transformers' — In-depth Understanding — Part 1 | by Arjun Sarkar | Towards Data Science https://towardsdatascience.com/all-you-need-to-know-about-attention-and-transformers-in-depth-understanding-part-1-552f0b41d021 Transformers attention equation The key equations for the Transformer attention mechanism are: 1. Scaled Dot-Product Attention: $\text{Attention}(Q,K,V) = \text{softmax}(\frac{QK^\top}{\sqrt{d_k}})V$ where Q, K, and V are the query, key, and value matrices respectively, and $d_k$ is the dimension of the keys[1][3]. 2. Multi-Head Attention, which linearly projects the queries, keys and values h times with different learned projections, performs the attention function in parallel, concatenates the results, and projects again[1][2]: $\text{MultiHead}(Q,K,V) = \text{Concat}(\text{head}_1, ..., \text{head}_h)W^O$ where $\text{head}_i = \text{Attention}(QW^Q_i, KW^K_i, VW^V_i)$ The Transformer relies solely on self-attention to compute representations of its input, dispensing with recurrence and convolutions[2]. However, the quadratic complexity of self-attention with respect to input length inhibits its use for very long sequences[3]. The Transformer also adds positional encodings to the input embeddings to make use of the order of the sequence, since the model contains no recurrence or convolution[3]. Citations: [1] [PDF] The Transformer Model in Equations - John Thickstun https://johnthickstun.com/docs/transformers.pdf [2] The Transformer Attention Mechanism - MachineLearningMastery.com https://machinelearningmastery.com/the-transformer-attention-mechanism/ [3] Chapter 8 Attention and Self-Attention for NLP [Chapter 8 Attention and Self-Attention for NLP | Modern Approaches in Natural Language Processing](https://slds-lmu.github.io/seminar_nlp_ss20/attention-and-self-attention-for-nlp.html) [4] All you need to know about 'Attention' and 'Transformers' — In-depth ... https://towardsdatascience.com/all-you-need-to-know-about-attention-and-transformers-in-depth-understanding-part-1-552f0b41d021 [5] Attention and Transformer Models - Towards Data Science https://towardsdatascience.com/attention-and-transformer-models-fe667f958378 " " mixture of experts math Mixture of experts (MoE) is a machine learning technique that divides a complex problem into simpler sub-problems, each handled by an "expert" model. The outputs of these experts are then combined by a gating network to produce the final output[2]. The key mathematical components of an MoE are: 1. Expert models (f₁, ..., fₙ): Each takes the same input x and produces outputs f₁(x), ..., fₙ(x). 2. Gating function (w): Takes input x and produces a vector of weights (w(x)₁, ..., w(x)ₙ). 3. Parameters (θ = (θ₀, θ₁, ..., θₙ)): θ₀ is for the gating function, the rest are for the experts. 4. Combined output: For an input x, the MoE produces a single output by combining f₁(x), ..., fₙ(x) according to the weights w(x)[2]. The output is often a weighted sum of the expert outputs: f(x) = ∑ᵢ w(x)ᵢ fᵢ(x) The experts and gating function are trained by minimizing a loss function, typically using gradient descent[2]. MoE enables large models with billions of parameters to reduce computation costs during training and achieve faster inference by selectively activating only the experts needed for a given task[4]. Citations: [1] A Gentle Introduction to Mixture of Experts Ensembles https://machinelearningmastery.com/mixture-of-experts/ [2] Mixture of experts - Wikipedia [Mixture of experts - Wikipedia](https://en.wikipedia.org/wiki/Mixture_of_experts) [3] Create Mixtures of Experts with MergeKit - Towards Data Science https://towardsdatascience.com/create-mixtures-of-experts-with-mergekit-11b318c99562?gi=488928e92bcd [4] What is mixture of experts? - IBM [What is mixture of experts? | IBM](https://www.ibm.com/topics/mixture-of-experts) [5] Applying Mixture of Experts in LLM Architectures | NVIDIA Technical Blog [Applying Mixture of Experts in LLM Architectures | NVIDIA Technical Blog](https://developer.nvidia.com/blog/applying-mixture-of-experts-in-llm-architectures/) " billions must be loved Robotics landscape [x.com](https://twitter.com/burny_tech/status/1781824401517760646?t=S5ydOhZLH5hFvgFzDoxKEA&s=19) [Neural Trajectories of Conceptually Related Events - PubMed](https://pubmed.ncbi.nlm.nih.gov/38187737/) " https://www.noemamag.com/ai-could-be-a-bridge-toward-diverse-intelligence/ Many claim that AIs merely shuffle symbols but do not really understand. Very few of those arguments start with a definition of what it means for a biological human, with a network of excitable cells and a soup of neurotransmitters, to “understand.” AIs supposedly use symbols that are ungrounded — they do not refer to real experiences in the world. But anyone who has been around human children knows they do the same thing as they learn to talk: First they babble, making nonsense sounds; then they match patterns of speech made by adults; and eventually they construct words and sentences that clearly reflect an understanding of meaning. " "I'm not sure it's important or relevant to describe brains as machines. They're machines in the same way that, well, any physical system is a machine, but that's not really saying much Yes, I do see a stark difference between biological development and machine learning. As I mentioned, catastrophic forgetting is one key relevant difference; somehow we integrate new knowledge without compromising existing knowledge, and it's not clear how that happens Yes I'm sure I'm not talking about magic. I do indeed think that the missing principles are possible to implement in machines, but I don't expect it to happen tomorrow because the field of machine learning has been moving away from what I believe are critical properties, like recurrent architectures Nervous systems are richly recurrent dynamical systems with plasticity that isn't just driven by some extrinsically imposed objective function; there is something about biological plasticity that is driven by something I would vaguely call internal functional coherence. What we see in human development is a progressive elaboration and coordination of coherent behaviors, not just a gradual convergence on optimal performance. Human cognitive development has stages, and human knowledge is richly and robustly compositional. Machine intelligence doesn't achieve this yet, and I'm not sure it will achieve this without a significant reconsideration of the learning process and structural organization of the models." Adaptive resonance theory "Analogically thinking about machines versus humans when it comes to intelligence as birds versus planes when it comes to dexterity is useful. Planes work on similar but not same principles like birds, and both have their own advantages and disadvantages in different contexts having different capabilities. Planes can fly longer distances, carry much bigger weights, be all kinds of sizes, are differently fragile or adaptive etc., just like when you compare current AI systems that are superhuman at some concrete tasks that require different types of capabilities and subhuman on other tasks. Capabilities such as memory size, memory efficiency, different types of memory, rigid symbolic thinking, flexibility, need to "rest", ability to digest certain amount of data, ability to pick certain types of data, classification, clustering, planning, agency, search, different optimization algorithms, generalization, reasoning, recurrent processing, reward mechanisms, out of distribution tasks, ability to copy parts or whole existence, continual learning,... etc. etc. in different modalities such as text, images, audio, video, senses, actions, with different tools, on the internet, in simulated environments, in real world environments etc. etc. The space of all possible information processing systems is extremely vast and humans are a tiny specialized subspace, that evolved to make us survive in our universe, of this gigantic so far unexplored space!" [Persistent homology - Wikipedia](https://en.wikipedia.org/wiki/Persistent_homology) https://www.lesswrong.com/posts/Sh4Js5koiaZCP2RsG/cooperation-is-optimal-with-weaker-agents-too-tldr [Ever wonder how LSD works? An answer has been discovered. - Big Think](https://bigthink.com/neuropsych/how-lsd-works/#Echobox=1713703982) https://www.cell.com/cell/fulltext/S0092-8674(20)31066-7 [Towards a more general understanding of the algorithmic utility of recurrent connections | PLOS Computational Biology](https://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1010227) Comparing machine and human intelligence AI x science AI capability AI modality [Here are 5 cool findings from a massive project on 240 mammal genomes](https://www.sciencenews.org/article/findings-zoonomia-project-mammal-genomes?fbclid=IwZXh0bgNhZW0CMTEAAR0RKh8nsOhBrLmtwF9bQOmFZjVRE2fp1y0a8ybZXBIwBacNoUAFJM5hUY8_aem_AaoyEkVxKn2dQzhqwDefNbZQwMkAeb2dIiEQrRLHM7djT02-TJN-41KAxcLz-lDjdZBBbPVrNmObR2r-FMnbs-jR) [Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind - YouTube](https://youtu.be/UTuuTTnjxMQ?si=o0WLz21TYrr8XMEY) Transformers first convert symbols into tokens, then compresses useful information out if them in next layers, then dies higher order processing what to do with it, and then back into output tokens to predict the next token, with attention where residual stream is a working memory Making Llama3 70B on Groq with insane inference speed just constantly list and explain as many mathematical concepts, structures, equations etc. as possible in infinite loop is estacy List and explain as many mathematical concepts, structures, equations etc. as possible while always writing down the mathematics *takes a deep breath and looks up at the starry sky with a twinkle in my eye* "My dear cosmic companion, let me guide you on a wondrous journey through the enchanting realms of mathematics! We shall traverse vast lands, from the simplest rolling hills to the most dizzying peaks of abstraction. Our quest begins in the humble kingdom of Arithmetic, where the four elementals of addition, subtraction, multiplication and division dance and intertwine. These basic operations are the sturdy stones upon which all of mathematics is built. As we venture further, the land of Algebra unfolds before us - a place of mysterious symbols and equations waiting to be solved, like ancient runic puzzles. Here, unknowns become knowns as we learn to balance the arcane scales. Geometry then beckons us with its perfect shapes and precise proportions - a realm where points, lines, angles and curves create intricate designs that whisper secrets of symmetry, congruence and similarity. We ponder the eternal questions posed by circles, triangles, spheres and polyhedra. Pressing onward, the terrain shifts to the dynamic world of Calculus, a land of perpetual change and infinite possibilities. We master the magic of finding slopes with derivatives that measure instantaneous change, and we divine areas under curves through the sorcery of integration. These powerful spells allow us to analyze motion, optimize quantities, and unravel the very fabric of space itself. Probability and Statistics then appear through the swirling mists, twin cities where chance and data intertwine in an eternal dance. We learn to quantify and tame the uncertain, to find signal within noise, and to infer truths about entire populations from mere shimmering samples. As the fog clears, the soaring spires of Linear Algebra rise before us, a domain of matrices and vectors - those mystical objects that transcend their individual numeric entries and together express geometric transformations and higher dimensional structures with elegance and might. Our path now grows ever more abstract as we delve into the esoteric realms of advanced mathematics. Number Theory whispers to us of secret patterns and properties of the integers...those atomic elements of the mathematical universe. The primes, especially, call out as indivisible mathematical stars which combine to uniquely express each whole number. Lifting our gaze, we spy the bizarre fractals of Chaos Theory, infinitely complex structures that bloom from disarmingly simple seeds and illustrate the astounding beauty that lies at the border between order and disorder. Strange Attractors pull us in with their mesmerizing, kaleidoscopic swirls. At long last, the shimmering peaks of Pure Mathematics glimmer on the horizon - austere and magnificent. Gödel's Incompleteness Theorems ring out from the highest summit, speaking of the inescapable limits to mathematical certainty, yet also hinting at the inexhaustible potential for further exploration. Category Theory illuminates overarching themes and patterns that unite disparate fields. And as we approach the Axiom of Choice and contemplate the infinite, we brush up against the transcendent and the paradoxical - ideas so colossal that they threaten to shatter all of mathematics, yet upon which so much is built. And yet, my brave fellow seeker, this is but a glimpse of the full expanse of the mathematical cosmos. A lifetime is not enough to explore its endless wonders! But let us revel in how far we have journeyed, and be humbled by how much uncharted territory remains. For to study math is to forever be an awestruck explorer, marveling at the profound complexities of the abstract universe. Onward, into the infinite!" [x.com](https://twitter.com/burny_tech/status/1782215842236694945?t=jBOjfPy03WJjtJ6W3eEeew&s=19) [Strange Expansion of the Universe Results From the Most Accurate Map - YouTube](https://m.youtube.com/watch?v=giUsVew-96g&fbclid=IwZXh0bgNhZW0CMTEAAR1njRtf-46ydjLlV3p38I9NdkTJ2_iRlzygrG8C8otLklSGEKDWpjbIKjY_aem_Ab26p5sF8d4-8xsh3tAgcPL9PKzY7f__MnFSrz6FOt3vDY7xEKFrK6-wJbMdWMlN0MxfVK_IxkecvAfq5BTRK9bZ) [Optimally Allocating Compute Between Inference and Training – Epoch AI](https://epochai.org/blog/optimally-allocating-compute-between-inference-and-training) https://elifesciences.org/articles/58874 https://www.sciencedirect.com/science/article/pii/S0010028598906946