Book 136 - Burny

ten jeho model decentralizovaných kooperujících AI agentů co fungují na odlišných algoritmech ale jsou spojený přes abstraktní matematický metareprezentace tvořící kolektivní inteligenci začíná být fakt zajímavej 🤯 [General Theory of General Intelligence: A Pragmatic Patternist Perspective. Introduction (1/10) - YouTube](https://www.youtube.com/watch?v=d8nzFqoEOvE) so far a model of decentralized cooperating AI agents that each use different algorithms that are connected via an abstract mathematical metarepresentations forming collective intelligence sounds really interesting 🤯 hmm zajímavě do toho dal abstraktní algebru, funkcionální programování, teorii kategorií (galois connections implementovaný jako chronomorphisms tvořící fold followed by unfold), různý algoritmy z machine learningu (probabilistic/logical/evolutionary metody, clusering/program learning/ pattern mining/attention allocation etc.), věci z kognitivních věd (filozofie do toho) apod. do jednoho 😄 In his general theory of general intelligence he seems to interestingly put together various concepts from machine learning, abstract algebra, functional programming, category theory, graph theory, algorithmic information theory, optimization theory, formal logic, things from cognitive sciences, etc. into one :thinking: cant wait to get into the math and implementation more i think this general approach of synthetizing a lot of functioning approaches from different domains into one collective system might be what we need to create a hybrid neurosymbolic system that is more general for as many tasks as possible hmmm, in a way this reminds me of mixture of experts, but on an metalearning algoritmic level possibly? designing an analogue to a router sounds like a challenge? and also not falling into combinatorial explosion fromm too many degrees of freedom sounds like a challenge which seems most current metalearning approaches suffer from iirc? an algorithm that learns different ML algorithms on their own, like program synthesis for ML algorithms but i guess Ben a diverse set of existing ML algos into a collective system? not sure, digging deeper now i need to refresh my mind on what all metalearning literature is out there [Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)) :eyes: hmm yea [GitHub - oneHuster/Meta-Learning-Papers: A classified list of meta learning papers based on realm.](https://github.com/oneHuster/Meta-Learning-Papers) https://arxiv.org/abs/2004.05439 https://arxiv.org/abs/2301.08028 [A survey of deep meta-learning | Artificial Intelligence Review](https://link.springer.com/article/10.1007/s10462-021-10004-4) https://www.sciencedirect.com/science/article/pii/S003132032200067X https://www.sciencedirect.com/science/article/pii/S0950705122004737 Learning to Learn without Gradient Descent by Gradient Descent: https://arxiv.org/abs/1611.03824 Learning to Optimize: https://arxiv.org/abs/1606.01885 Evolved Policy Gradients: https://arxiv.org/abs/1802.04821 RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning: [RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning | DeepAI](https://deepai.org/publication/rl-2-fast-reinforcement-learning-via-slow-reinforcement-learning) Evolving Deep Neural Networks https://arxiv.org/abs/1703.00548 https://arxiv.org/abs/1812.09584 https://arxiv.org/abs/1901.11117 cool "Our goal is to apply neural architecture search to search for a better alternative to the Transformer. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks" https://arxiv.org/abs/2006.08084 "First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training." https://arxiv.org/abs/2010.12621 + generalization Creating a gigantic map of metalearning in machine learning is a complex task that would require a significant amount of research, analysis, and visualization. However, I can provide you with a high-level overview of the key components and concepts involved in metalearning within the context of machine learning. Please note that this is not an exhaustive list, but rather a starting point for understanding the field. Metalearning in Machine Learning: 1. Definition: - Metalearning, also known as "learning to learn," is a subfield of machine learning that focuses on developing algorithms and techniques that enable machines to improve their learning capabilities over time. 2. Types of Metalearning: a. Few-Shot Learning: - Learning from a small number of examples - Techniques: Siamese Networks, Prototypical Networks, Matching Networks b. Transfer Learning: - Leveraging knowledge from one domain to improve learning in another domain - Techniques: Fine-tuning, Domain Adaptation, Multitask Learning c. Learning to Optimize: - Automating the process of hyperparameter optimization and model selection - Techniques: Bayesian Optimization, Reinforcement Learning, Evolutionary Algorithms d. Meta-Reinforcement Learning: - Learning to adapt and improve reinforcement learning algorithms - Techniques: MAML (Model-Agnostic Meta-Learning), RL^2 (Fast Reinforcement Learning via Slow Reinforcement Learning) 3. Key Concepts: a. Meta-Models: - Models that learn to generate or adapt other models - Examples: Neural Architecture Search, Hypernetworks b. Meta-Datasets: - Datasets designed for metalearning tasks - Examples: Omniglot, Mini-ImageNet, Meta-Dataset c. Meta-Algorithms: - Algorithms that learn to optimize or adapt other algorithms - Examples: MAML, Reptile, LEO (Latent Embedding Optimization) 4. Applications: a. Few-Shot Classification b. Few-Shot Regression c. Few-Shot Reinforcement Learning d. Neural Architecture Search e. Hyperparameter Optimization f. Continual Learning g. Domain Adaptation 5. Challenges and Future Directions: a. Scalability and Computational Efficiency b. Generalization and Robustness c. Interpretability and Explainability d. Integration with Other Machine Learning Paradigms e. Real-World Applications and Deployment This overview provides a high-level map of metalearning in machine learning. Each component mentioned above can be further expanded and explored in greater detail. Metalearning is an active area of research, and new techniques, algorithms, and applications are continually being developed to advance the field. Yes, there have been several research efforts focused on developing meta-learning algorithms that can learn and discover different machine learning algorithms on their own. This area of research is often referred to as "Learning to Learn," "AutoML" (Automated Machine Learning), or "Program Synthesis for Machine Learning." Here are a few notable examples: 1. Neural Architecture Search (NAS): - NAS is a technique that aims to automatically design neural network architectures for a given task. - It uses search algorithms, such as reinforcement learning, evolutionary algorithms, or gradient-based methods, to explore the space of possible architectures and find the most suitable one. - Examples: NASNet, AmoebaNet, ENAS (Efficient Neural Architecture Search) 2. AutoML frameworks: - AutoML frameworks aim to automate the entire machine learning pipeline, including data preprocessing, feature engineering, model selection, and hyperparameter tuning. - They often employ meta-learning techniques to learn and optimize the various components of the pipeline. - Examples: Google Cloud AutoML, H2O AutoML, Auto-sklearn 3. Program Synthesis for Machine Learning: - Program synthesis involves automatically generating code or programs based on a given specification or examples. - In the context of machine learning, program synthesis can be used to generate machine learning algorithms or components of algorithms. - Examples: - "Learning to Learn without Gradient Descent by Gradient Descent" by Chen et al. (2017) - Learns optimization algorithms using gradient descent. - "Neural Programmer-Interpreters" by Reed and de Freitas (2016) - Learns to generate programs from examples and natural language descriptions. 4. Learning to Optimize: - Learning to Optimize focuses on developing algorithms that can learn to optimize other algorithms or learn optimization strategies. - Examples: - "Learning to Optimize" by Li and Malik (2017) - Learns optimization algorithms using reinforcement learning. - "Learning to Optimize Neural Nets" by Andrychowicz et al. (2016) - Learns to optimize neural networks using an LSTM-based optimizer. 5. Meta-Reinforcement Learning: - Meta-reinforcement learning aims to learn reinforcement learning algorithms that can adapt quickly to new tasks or environments. - Examples: - "RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning" by Duan et al. (2016) - Learns a reinforcement learning algorithm using another reinforcement learning algorithm. - "Evolved Policy Gradients" by Houthooft et al. (2018) - Uses evolutionary strategies to learn reinforcement learning algorithms. These are just a few examples of the ongoing research efforts in the field of meta-learning and automated discovery of machine learning algorithms. As the field advances, we can expect to see more sophisticated and efficient techniques for learning and synthesizing machine learning algorithms. Certainly! Here are a few more examples and research directions related to meta-learning algorithms that learn different ML algorithms on their own: 6. Learning to Learn by Gradient Descent by Gradient Descent: - This approach, proposed by Andrychowicz et al. (2016), learns the optimization algorithm itself using gradient descent. - The idea is to train an LSTM-based optimizer to update the weights of a neural network based on the gradients, effectively learning an optimization algorithm. - This allows the optimizer to adapt to the specific problem and potentially discover more efficient optimization strategies. 7. Learning to Optimize with Reinforcement Learning: - Instead of using gradient descent, reinforcement learning can be used to learn optimization algorithms. - For example, "Learning to Optimize" by Li and Malik (2017) trains a reinforcement learning agent to make decisions on how to update the parameters of a model based on the current state and reward signal. - The agent learns to optimize the model by exploring different optimization strategies and receiving rewards based on the model's performance. 8. Evolutionary Algorithms for Learning Algorithms: - Evolutionary algorithms, such as genetic algorithms or evolution strategies, can be used to evolve and discover new machine learning algorithms. - These algorithms start with a population of candidate algorithms and iteratively evolve them based on their performance on a given task. - Examples include "Evolving Deep Neural Networks" by Miikkulainen et al. (2019) and "Evolved Transformer" by So et al. (2019), which use evolutionary algorithms to discover novel neural network architectures. 9. Learning to Learn with Memory-Augmented Neural Networks: - Memory-augmented neural networks, such as the Neural Turing Machine (NTM) or Differentiable Neural Computer (DNC), can be used to learn algorithms by storing and retrieving relevant information from an external memory. - These models can learn to read from and write to memory, allowing them to discover and execute algorithmic patterns. - Examples include "Meta-Learning with Memory-Augmented Neural Networks" by Santoro et al. (2016) and "Learning to Remember Rare Events" by Kaiser et al. (2017). 10. Learning Algorithms with Graph Neural Networks: - Graph Neural Networks (GNNs) can be used to learn and represent algorithms as computation graphs. - By encoding algorithms as graphs and using GNNs to operate on these graphs, it becomes possible to learn and discover new algorithmic structures. - Examples include "Neural Execution Engines" by Yan et al. (2020) and "Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks" by Xu et al. (2020). These examples showcase the diverse approaches and techniques being explored in the field of meta-learning for discovering and learning machine learning algorithms. As research in this area continues to progress, we can expect to see more innovative and powerful methods for automating the process of algorithm discovery and optimization. https://arxiv.org/abs/1902.00741 [Gödel machine - Wikipedia](https://en.wikipedia.org/wiki/Gödel_machine) [GOEDEL MACHINE HOME PAGE](https://people.idsia.ch/~juergen/goedelmachine.html) "Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. Mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. Inspired by Kurt Gödel's celebrated self-referential formulas (1931), a Gödel machine (or `Goedel machine' but not `Godel machine') rewrites any part of its own code as soon as it has found a proof that the rewrite is useful, where the problem-dependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and efficiently tests computable proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable self-rewrite. We show that such a self-rewrite is globally optimal - no local maxima! - since the code first had to prove that it is not useful to continue the proof search for alternative self-rewrites. Unlike previous non-self-referential methods based on hardwired proof searchers, ours not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()-notation, provided the utility of such speed-ups is provable at all." https://arxiv.org/abs/1901.08162 [Using natural language and program abstractions to instill human inductive biases in machines | OpenReview](https://openreview.net/forum?id=buXZ7nIqiwE) [#49 - Meta-Gradients in RL - Dr. Tom Zahavy (DeepMind) - YouTube](https://www.youtube.com/watch?v=hfaZwgk_iS0) https://arxiv.org/abs/2303.17768 [Meta-Learning Is All You Need — James Le](https://jameskle.com/writes/meta-learning-is-all-you-need) [Bayesian Meta-Learning Is All You Need — James Le](https://jameskle.com/writes/bayesian-meta-learning-is-all-you-need) [The Most Important Algorithm in Machine Learning - YouTube](https://www.youtube.com/watch?v=SmZmBKc7Lrs) https://www.sciencedirect.com/science/article/pii/S0004370223002084 https://arxiv.org/abs/2005.04589 [Bridging AGI Theory and Practice with Galois Connections | Artificial General Intelligence](https://dl.acm.org/doi/abs/10.1007/978-3-031-33469-6_12) https://arxiv.org/abs/2102.10581 [The Mathematics of Consciousness (Integrated Information Theory) - YouTube](https://www.youtube.com/watch?v=Sp9B7raW_O4) iit math [Hierarchical clustering - Wikipedia](https://en.wikipedia.org/wiki/Hierarchical_clustering) Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors) [Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors) - YouTube](https://www.youtube.com/watch?v=Xp3jR-ttMfo) the more I dig into the diversity of possible machine learning approaches, the more surprised I am how not really diverse the current LLM train seems to be symmetries is all you need https://arxiv.org/abs/2112.03321 [Noether Networks: Meta-Learning Useful Conserved Quantities (w/ the authors) - YouTube](https://www.youtube.com/watch?v=Xp3jR-ttMfo) geometric deep learning turned into a loss function https://arxiv.org/abs/2403.19887 Hybride all ML architectures, create neurosymbolic shapeshifting ubermesch adapting to everything generalizing to everything [Reinforcement Learning, by the Book - YouTube](https://www.youtube.com/watch?v=NFo9v_yKQXA) [Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2 - YouTube](https://www.youtube.com/watch?v=_j6pvGEchWU) everything is probability theory is you squint hard enough [Radware Bot Manager Captcha](https://iopscience.iop.org/article/10.1088/1361-6633/aa5ae2#ropaa5ae2s8) [x.com](https://twitter.com/burny_tech/status/1774848819642937513) "Roadmap and summary of the methods considered in this review. The arrows represent possible routes for derivations. Labelled arrows represent derivations that are explicitly treated in the respective sections. For example, the forward and backward master equations are derived from the Chapman–Kolmogorov equation in section 1.3. In this section, we also discuss the path summation representation (19) of the conditional probability distribution p n ( ) τ, , |t n 0 0 . This representation can be derived by examining the stochastic simulation algorithm (SSA) of Gillespie [75–78] or by performing a Laplace transformation of the forward master equation (10) (see appendix B). In sections 2 and 3, the forward and the backward master equations are cast into four linear PDEs, also called ‘flow equations’. These equations are obeyed by a probability generating function, a probability generating functional, a marginalized distribution, and a further series expansion. The flow equations can be solved in terms of a forward and a backward path integral as shown in sections 4 and 5. Upon performing inverse transformations, the path integrals provide two distinct representations of the conditional probability distribution solving the master equations. Moreover, they can be used to represent averaged observables as explained in section 6. Besides the methods illustrated in the figure, we discuss path integral representations of processes with continuous state spaces whose master equations admit Kramers–Moyal expansions (sections 4.4 and 5.3). A truncation of the backward Kramers–Moyal expansion at the level of a diffusion approximation results in a path integral representation of the (backward) Fokker–Planck equation whose original development goes back to works of Martin, Siggia, and Rose [16], de Dominicis [17], Janssen [18, 19], and Bausch, Janssen, and Wagner [19]. The representation can be rewritten in terms of an Onsager–Machlup function [79], and it simplifies to Wiener’s path integral [80, 81] for purely diffusive Brownian motion [82]. Renormalization group techniques are not considered in this review. Information on these techniques can be found in [69, 70]." [WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE... - YouTube](https://www.youtube.com/watch?v=rie-9AEhYdY&feature=youtu.be) Golden marrying of category theory and deep learning categorical deep learning https://arxiv.org/abs/1703.10987 [OSF](https://osf.io/preprints/osf/4b2wj) "Self-improvising Memories: a perspective on memories as agential, dynamically-reinterpreting cognitive glue" [GitHub - thestephencasper/everything-you-need: we got you bro](https://github.com/thestephencasper/everything-you-need) x is all you need https://arxiv.org/abs/2402.16714 https://arxiv.org/abs/2403.19186v1 transformers visualized [But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning - YouTube](https://www.youtube.com/watch?v=wjZofJX0v4M) [What does it mean for computers to understand language? | LM1 - YouTube](https://www.youtube.com/watch?v=1il-s4mgNdI)