Major part of my meaning of life currently is to try to understand: - The most complete fundamental equation/s of intelligence: human intelligence, diverse machine intelligences (all sorts of current and future subfields of AI), other biological intelligences, collective intelligence, theoretical perfect AGI (AIXI variants, Chollet's intelligence, Legg's intelligence, etc.), hybrids, etc. - The most complete fundamental equation/s of the universe and the world in general: How does the standard model and general relativity work? How does everything else in our world on other scales with other fields, such as chemistry, biology and sociology, emerge? What is beyond the standard model of particle physics and general relativity, how to solve quantum gravity? " People say that we're heading towards artificial general intelligence (AGI), but by that they usually mean machine human-level intelligence (MHI) instead, a machine that is performing human digital or/and physical tasks as good as humans. And by artificial superintelligence (ASI), people mean machine superhuman intelligence (MSHI), that is even better than humans at human tasks. I think lot's of research goes towards very specialized machine narrow intelligences (MNI), which are very specialized and often superhuman in very specific tasks, such as playing games (MuZero), protein folding (AlphaFold), and maybe now soon reasoning (o3 model direction), and a lot of research also goes towards machine general intelligence (MGI), which will be much more general than human intelligence (HI), because humans are IMO very specialized biological systems in our evolutionary niche, in our everyday tasks and mathematical abilities, and other organisms are differently specialized, even tho we still share a lot. Generality of an intelligent system is a spectrum, and each system has differently general capabilities over different families of tasks than other ones, which we can see with all the current machine and biological intelligences, they're differently general over different families of tasks. That's why "AGI" feels much more continuous than discrete to me. Soon we might create some machine-biology hybrids as well. Then we should maybe start calling it carbon based intelligence (CI) and silicon based intelligence (SI) and carbon and silicon based intelligences (CSI). Dle některých starých definicí z kognitivních věd dosavadní AI systémy jsou AGI už dlouho, protože to např dokáže mít obecnou diskuzi o v podstatě skoro čemkoliv (až na moc narrow field specific knoweldge and skills, nedostatečnou agentnost apod.) But there are attempts to test for some degree of generality, like ARC-AGI benchmark, that was just destroyed by OpenAI's o3 at the end of the year: V ARC benchmarku jsou modely trénovaný jen na test datasetu, co exposuje core knowledge priory, kde eval dataset je private (unseen) a o dost harder kde se ty priory mají různě rekombinovat a různě z nich abstrahovat. Tenhle benchmark je nadesignovanej tak aby byl resistatní vůči memorizaci a naivnímu pattern matchingu. Víc detailů v textový formě: [Redirecting…](https://arcprize.org/arc) , <https://arcprize.org/blog/oai-o3-pub-breakthrough> a video formě: [Pattern Recognition vs True Intelligence - Francois Chollet](<[https://www.youtube.com/watch?v=JTU8Ha4Jyfc>](https://www.youtube.com/watch?v=JTU8Ha4Jyfc>) But I still don't think this is enough. In mathematical theory, a fully general AGI is impossible in practice. Something closer to a full general AGI would be for example AIXI: AIXI it considers all possible explanations (programs) for its observations and past actions and chooses actions that maximize expected future rewards across all these explanations, weighted by their simplicity (shortness) (Occam's razor). You can have this for the space of all possible environments giving the agent the space of all possible observations from all possible problems. It's not computable, and many people tried to formulate computable approximations. [AIXI - Wikipedia](https://en.wikipedia.org/wiki/AIXI) Humans and AI systems try to approximate this in their more narrow domains and take all sorts of cognitive shortcuts to be actually practical and not take infinite time and resources to decide. " December o3 model announcement be like: OpenAI and ARC team: "It's not AGI!" Bunch of influencers: "OpenAI says they released AGI and ARC team says it's AGI!" Dle Macrohardu to bude to AGI až to vydělá 100$ billion in profits Conclusion: Noone is human level intelligence if he didnt make 100$ billion in profits And before the OpenAI x Macrohard deal, OpenAI defined AGI as automated systems that can outperform people at "most economically valuable work" (AI labour) AGI se strašně pořád předefinovává, originální význam z kognitivních věd šel do celkem háje 😄 Dle některých starých definicí z kognitivních věd dosavadní AI systémy jsou AGI už dlouho, protože to např dokáže mít obecnou diskuzi o v podstatě skoro čemkoliv (až na moc narrow field specific knoweldge and skills, nedostatečnou agentnost apod.) AGI je dle mě o obecnosti, protože to doslova znamená "artificial general intelligence", a tyhle firmy se spíš často snaží vytvořit "artificial humanlike intelligence". Intelligence is the ability to make models in the form of representations and algorithms that abstract over previously seen data for out of distribution reasoning in concrete and abstract spaces to predict and control the future as accurately as possible The brain implements a world model that algoritmically runs on something between the overly flexible statistical deep learning and overly rigid symbolic physics engine on a chaotic complex stochastic out of equilibrium thermodynamical electrobiochemical hardware dynamical system with much more selfcorrecting mechanisms that is constantly tuned by sensory data "The invention of general relativity from newtonian physics is just interpolation at some sufficiently grandiose level of abstraction." - Adam Brown [https://youtu.be/LjY0i2B-Avc?si=3CZRupgk8cHQqy6k](https://youtu.be/LjY0i2B-Avc?si=3CZRupgk8cHQqy6k) i still cant fathom why all the models work so well they shouldnt i think we're in many ways creating alien intelligence, because in many things the various systems are already superior, but in others worse than babies so as long as the architectures are not the exat copy of the architecture of humans, there will probably still be some difference and you cannot get 1:1 human capabilities but there is a possibility that some alien architecture will be better than any human capability across the board (also data, hardware etc. matters too) i want to find the general equation of as general adaptible intelligence as possible humans are very specialized tbh i often dream of infinite intelligence augmentation (but in the limit you would completely replace the whole biological machine with something more efficient), but my motivation isnt really to feel superior, but i wish to use intelligence to help all beings instead what's worth keeping? unified individual continuous subjective flow of experience (qualia) but i think that can run on many architectures, brains arent special i think many other species have different forms of intelligence but thats probably the matter of defining intelligence i have collected like 100000 definitions of intelligence so far but my favorite definition of intelligence is the ability to generalize, the ability to mine previous experience to make sense of future novel situations, in fully general space of environments this guy formalized that using algorithmic information theory [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) hah ive skimmed a paper trying to mathematically formalize wisdom to implement into AI today [[2411.02478] Imagining and building wise machines: The centrality of AI metacognition](https://arxiv.org/abs/2411.02478) "we define wisdom as the ability to navigate intractable problems - those that are ambiguous, radically uncertain, novel, chaotic, or computationally explosive - through effective task-level and metacognitive strategies" AIXI is also interesting model of fully general intelligence, it considers all possible explanations (programs) for its observations and past actions and chooses actions that maximize expected future rewards across all these explanations, weighted by their simplicity (shortness) (Occam's razor), but its not computable, and many people tried to formulate computable approximations [AIXI - Wikipedia](https://en.wikipedia.org/wiki/AIXI) i also often think how can one implement this into image models, to enforce as much novelty and creativity as possible this this in general feels like the issue of engineering more robust stronger generalization, which seems to be one of the core topics in the whole AI field maybe somehow transfering some hacks from the winning methods used in ARC-AGI that tries (important word) to test for task agnostic generalization, from language models to image models, might work (inference time compute, test time training, neurosymbolic attempts,...) [[2412.04604] ARC Prize 2024: Technical Report](https://arxiv.org/abs/2412.04604) Nedávno jsem přemýšlel, že by obecnější test pro AGI mohl být vytvořený přes aproximaci týhle Cholletovy information conversion ratio definice AGI, kde by se obecnost systému evaluovala podle toho, jak moc dokáže transferovat insights z jednoho datasetu do různých jiných, co jsou různě odlišný, co do tý doby neviděl, jako se to snaží měřít cross validace. Chce to ale co nejvíc datasetů, co ještě neviděl, včetně všech těch, co testují nějakou jejich ideu generalizace. K tomu ale by se odlišnost datasetů by šla měřit různýma dataset similarity measures. AI systém by se mohl trénovat na těch základních datasetech, nebo sám o sobě dělat např selfplay, což vede k superhuman výsledkům ve hrách. Tomuhle se říká je strong out of distribuce generalizace, jeden z holy grailu of AI. Zatím jsem viděl extrémně specializovaný pokusy o podobný věci, ale nic obecnýho. Francois Chollet's definition of general intelligence: [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) Francois Chollet defines general intelligence as the ability to generalize, the efficiency with which you operationalize past information in order to deal with the future, which can be interpreted as a conversion ratio, which you express formally using algorithmic information theory. I was thinking about creating a benchmark that tests this generality potentially more thoroughly than ARC, based on this conversion ratio. Maybe one could design a better benchmark that would: - First, make sure to have explicit access to the training dataset that was used to train the model. - Then, evaluate the model on many different unseen datasets. (cross-validation on steroids) - The generalization power could potentially be quantified by how well the model performs across as many diverse datasets as possible, where dataset similarity with the training dataset could be measured using some dataset similarity metric. This metric could maybe approximate that conversion ratio to some degree? The diverse datasets could include the ARC dataset among many others that exist for OOD testing. This approach sounds much more resistant to memorization. But since you have to monitor the training data, the most popular closed-source mainstream LLMs would be disqualified if they keep their training data secret. Survey of dataset similarity "The performance of a predictive model on novel datasets, referred to as generalizability, depends on how similar the training and evaluation datasets are. Exploiting or transferring insights between similar datasets is a key aspect of meta-learning and transfer-learning. We examine more than 100 methods and provide a taxonomy, classifying them into ten classes." [[2312.04078] Methods for Quantifying Dataset Similarity: a Review, Taxonomy and Comparison](https://arxiv.org/abs/2312.04078) Some papers that are a bit related, but not related enough are: - Evaluating generalizability of artificial intelligence models for molecular datasets: "we show that previous approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits for a given model and input data. We plot model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability" [Evaluating generalizability of artificial intelligence models for molecular datasets - PMC](https://pmc.ncbi.nlm.nih.gov/articles/PMC10925170/) - Towards A Measure Of General Machine Intelligence: "we first propose a common language of instruction, a programming language that allows the expression of programs in the form of directed acyclic graphs across a wide variety of real-world domains and computing platforms. Using programs generated in this language, we demonstrate a match-based method to both score performance and calculate the generalization difficulty of any given set of tasks. We use these to define a numeric benchmark called the generalization index, or the g-index , to measure and compare the skill-acquisition efficiency of any intelligence system on a set of real-world tasks." [[2109.12075] Towards A Measure Of General Machine Intelligence](https://arxiv.org/abs/2109.12075) - A practical generalization metric for deep networks benchmarking: "deep network’s generalization capacity in classification tasks is contingent upon both classification accuracy and the diversity of unseen data" [[2409.01498v1] A practical generalization metric for deep networks benchmarking](https://arxiv.org/abs/2409.01498v1) - Rethinking LLM Memorization through the Lens of Adversarial Compression "In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute." [[2404.15146] Rethinking LLM Memorization through the Lens of Adversarial Compression](https://arxiv.org/abs/2404.15146) - The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain "We describe ConceptARC, a new, publicly available benchmark in the ARC domain that systematically assesses abstraction and generalization abilities on a number of basic spatial and semantic concepts. ConceptARC differs from the original ARC dataset in that it is specifically organized around "concept groups" -- sets of problems that focus on specific concepts and that are vary in complexity and level of abstraction." [[2305.07141] The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain](https://arxiv.org/abs/2305.07141) what mathematical laws must govern a system in order to have the greatest ability to generate the most unique and useful creations jakými matematickými zákony se musí systém řídit aby měl co největší schopnost generování co nejunikátnějších a nejužitečnějších výtvorů třeba když Von Neumann vymyslel Von Neumannovu architekturu na který jede většina dnešních počítačů 😄 strašně mě zajímá jaký všechny fyzikální/chemický/biologický/systémový/výpočetní/atd. zákony tomu, že to on vymyslel, umožnily tak lepší příklad jsou např maxwellovy rovnice 😄 nebo kvantovka nebo relativita nebo turing machine