## Tags - Part of: [[Intelligence]], [[Science]] [[Engineering]] [[Computer science]] [[Technology]] [[Natural science]], [[Mathematics]] [[Formal science]] - Related: [[Collective Intelligence]], [[General intelligence]], [[Artificial General Intelligence]], [[Theory of Everything in Intelligence]], [[Biological intelligence]] - Includes: [[Mechanistic interpretability]], [[Mathematical theory of artificial intelligence]], [[AI engineering]] - Additional: ## Definitions - A [[Systems theory|system]] that is [[Intelligence|intelligent]] and constructed by humans. - A branch of [[Computer science]] which develops and studies [[Intelligence|intelligent]] machines. - Artificial intelligence (AI), in its broadest sense, is intelligence exhibited by machines, particularly computer systems. It is a field of research in computer science that develops and studies methods and software which enable machines to perceive their environment and uses learning and intelligence to take actions that maximize their chances of achieving defined [[goal|goals]]. ## Main resources - [Artificial intelligence - Wikipedia](https://en.wikipedia.org/wiki/Artificial_intelligence) <iframe src="https://en.wikipedia.org/wiki/Artificial Intelligence" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe> ### Lectures - Stanford probability theory https://www.youtube.com/playlist?list=PLoROMvodv4rOpr_A7B9SriE_iZmkanvUg - Stanford machine learning [https://www.youtube.com/playlist?list=PLoROMvodv4rNyWOpJg_Yh4NSqI4Z4vOYy](https://www.youtube.com/playlist?list=PLoROMvodv4rNyWOpJg_Yh4NSqI4Z4vOYy) https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU , [Fetching Title#db0y](https://www.coursera.org/specializations/machine-learning-introduction) - Stanford transformers [https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM](https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM) - Stanford generative models including diffusion [https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8](https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8) - Stanford deep learning [https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb](https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb) - Stanford natural language processing with deep learning [https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4](https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4) - [Search | MIT OpenCourseWare | Free Online Course Materials on Machine Learning](https://ocw.mit.edu/search/?q=machine%20learning), [Search | MIT OpenCourseWare | Free Online Course Materials on AI](https://ocw.mit.edu/search/?q=AI) - Harvard AI [Harvard CS50’s Artificial Intelligence with Python – Full University Course - YouTube](https://www.youtube.com/watch?v=5NgNicANyqM&t=16s) - [Neural Networks: Zero to Hero - YouTube](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ) - [What is a Transformer? Neel Nanda - YouTube](https://youtube.com/playlist?list=PL7m7hLIqA0hoIUPhC26ASCVs_VrqcDpAz&si=L5WmZ7a0LCC4ML6y) ### Books - [fast.ai – fast.ai—Making neural nets uncool again](https://www.fast.ai/) - [Dive into Deep Learning — Dive into Deep Learning 1.0.3 documentation](https://www.d2l.ai/) - [Amazon.com: Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python eBook : Raschka, Sebastian, Liu, Yuxi (Hayden), Mirjalili, Vahid, Dzhulgakov, Dmytro: Kindle Store](https://www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-learning-ebook/dp/B09NW48MR1) - [Are there any books I should read to learn machine learning from scratch? : r/learnmachinelearning](https://www.reddit.com/r/learnmachinelearning/comments/13y4rzn/are_there_any_books_i_should_read_to_learn/) - [best AI books - Hledat Googlem](https://www.google.com/search?q=best+AI+books&sca_esv=e14f95cbc2b145ff&sca_upv=1&sxsrf=ADLYWIKWrE3QSZ6sLX-ITX-nVDg3qWaDFg%3A1727604674151&ei=wif5ZpLqCJD97_UPvMyG6Qs&ved=0ahUKEwiS06j39OeIAxWQ_rsIHTymIb0Q4dUDCA8&uact=5&oq=best+AI+books&gs_lp=Egxnd3Mtd2l6LXNlcnAiDWJlc3QgQUkgYm9va3MyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAESOoTUOIFWPYScAF4AZABAJgBf6ABrgWqAQM1LjK4AQPIAQD4AQGYAgigAoYGwgIKEAAYsAMY1gQYR8ICDRAAGIAEGLADGEMYigXCAg4QABiwAxjkAhjWBNgBAcICExAuGIAEGLADGEMYyAMYigXYAQHCAgoQIxiABBgnGIoFwgIKEAAYgAQYFBiHApgDAIgGAZAGE7oGBggBEAEYCZIHAzQuNKAHkCU&sclient=gws-wiz-serp) - [best machine learning books - Hledat Googlem](https://www.google.com/search?q=best+machine+learning+books&sca_esv=e14f95cbc2b145ff&sca_upv=1&sxsrf=ADLYWILSLOI-HtGkXlMqkH5ml_uoQNnbJw%3A1727604694624&ei=1if5ZrzNJbPp7_UPhJSzqQg&ved=0ahUKEwi8kIqB9eeIAxWz9LsIHQTKLIUQ4dUDCA8&uact=5&oq=best+machine+learning+books&gs_lp=Egxnd3Mtd2l6LXNlcnAiG2Jlc3QgbWFjaGluZSBsZWFybmluZyBib29rczIGEAAYBxgeMgYQABgHGB4yBhAAGAcYHjIGEAAYBxgeMgYQABgHGB4yBhAAGAcYHjIGEAAYBxgeMgYQABgHGB4yBhAAGAcYHjIGEAAYBxgeSIkaUL0MWKoZcAN4AZABAJgBeKABzQuqAQM4Lje4AQPIAQD4AQGYAgqgAuEGwgIKEAAYsAMY1gQYR8ICDRAAGIAEGLADGEMYigXCAg4QABiwAxjkAhjWBNgBAcICExAuGIAEGLADGEMYyAMYigXYAQGYAwCIBgGQBhO6BgYIARABGAmSBwM1LjWgB4R8&sclient=gws-wiz-serp) ## Landscapes #### By approach - [[Symbolic AI]] - ![[Symbolic AI#Definitions]] - [[Logic-based AI]] - [[Knowledge-based systems]] - [[Expert systems]] - [[Ontologies]] - [[Semantic networks]] - [[Statistical AI]] - [[Machine learning]] - ![[Machine learning#Definitions]] - [[Supervised learning]] - [[Unsupervised learning]] - [[Semi-supervised learning]] - [[Reinforcement learning]] - [[Probabilistic AI]] - [[Bayesian AI]] - [[Quantum machine learning]] - [[Thermodynamic AI]] - [[Connectionist AI]] - [[Neural networks]] and [[Deep Learning]] - [[Feedforward neural networks]] - [[Convolutional neural networks]] (CNNs) - [[Recurrent neural networks]] (RNNs) - [[Long short-term memory]] (LSTM) - [[Transformer]] - [[Graph neural networks]] - [[Capsule networks]] - [[Spiking neural networks]] - [[Quantum neural networks]] - [[Generative adversarial networks]] (GANs) - [[Variational autoencoders]] (VAEs) - [[Diffusion models]] - [[Flow-based models]] - [[Attention mechanisms]] - [[Memory-augmented neural networks]] - [[Neural turing machine]] - [[Neural Cellular Automata ]] - [[Scaling hypothesis]], [[Bitter Lesson]] - [[Transfer learning]] - [[Self-supervised learning]] - [[Contrastive learning]] - [[Hybrid AI]] - ![[Hybrid AI#Definitions]] - [[Neurosymbolic AI]] - [[Evolutionary AI]] - [[Genetic algorithms]] - [[Evolutionary strategies]] - [[Swarm intelligence]] - [[Distributed AI]] - [[Cognitive AI]] - [[Cognitive architectures]] - [[Embodied AI]] - [[Robotics]] - [[Distributed AI]] - [[Multi-agent systems]] - [[Quantum AI]] - [[Quantum machine learning]] - [[Quantum neural networks]] - [[Quantum annealing]] - [[Biologically inspired AI]] - [[Neuromorphic AI]] - [[Spiking neural networks]] - [[Reservoir computing]] - [[Physics inspired AI]] - [[Liquid neural networks]] - [[Explainable AI ]] - I explore [[Mathematical theory of artificial intelligence|understanding AI using mathematical theory]] from the perspective of [[physics]] or [[mathematics|pure mathematics]] or other [[science|sciences]] and by reverse engineering using [[mechanistic interpretability]] from the perspective of [[neuroscience]] or other [[science|sciences]], [[Artificial General Intelligence|AGI]], [[superintelligence]], [[large language model|LLMs]], [[reinforcement learning]], AI beyond LLMs and autoregression and [[transformer|transformers]] and pure [[deep learning]] like [[neurosymbolic AI]] or [[physics inspired AI]] like [[diffusion models]] and [[liquid neural networks]] or [[biology inspired AI]] like [[evolutionary AI]] and [[selforganizing AI]] or [[open-ended novelty search]], [[Artificial Intelligence x Science|AI for science]] like [[Artificial Intelligence x Physics|physics]] and [[Artificial Intelligence x Biology|biology]] or [[Recursive self-improvement|AI research itself]], [[Artificial Intelligence x Biological Intelligence|comparing AI and biological intelligence]], [[intelligence|diversity of minds or intelligences or information processing systems]], [[AI safety|making AI do what we want]], [[AI engineering]], [[creativity]], [[curiosity]], [[grokking]], [[artificial intelligence x programming coding software engineering|AI for engineering like software engineering and programming]], [[artificial intelligence x healthcare|AI for good like healthcare]], [[agent|AI agents]], [[Future of humanity, AI, sentience, futurology, politics|future of humanity and AI, futurology]], [[politics]], [[Future of humanity, AI, sentience, futurology, politics|geopolitics of AI]], [[transhumanism]], [[Future of humanity, AI, sentience, futurology, politics|world views]] like [[Effective Altruism]] and [[Effective Accelerationism]] or [[rationalism]] and [[postrationalism]], [[Benefits, risks, impact and future of artificial intelligence|potential benefits and risks of advanced technologies like AI and everything]], [[Future of humanity, AI, sentience, futurology, politics|forecasting]], [[open science]], [[open source]], [[Future of humanity, AI, sentience, futurology, politics|democratizing technology]], [[Future of humanity, AI, sentience, futurology, politics|concentration vs decentralization of power]] #### Crossovers [[Omnidisciplionarity]] - [[Artificial Intelligence x Biological Intelligence]] - [[Artificial Intelligence x Biological Intelligence x Collective Intelligence]] - [[Artificial intelligence x Science]] - [[Artificial Intelligence x Mathematics]] - [[AlphaProof]] - [[Artificial Intelligence x Physics]] - [[FermiNet]] - [[Artificial Intelligence x Chemistry]] - [[Artificial Intelligence x Biology]] - [[AlphaFold]] - [[AlphaProteo]] - [[Artificial Intelligence x Neuroscience]] - [[Artificial intelligence x Programming Coding Software Engineering]] - [[Artificial intelligence x Engineering]] - [[AlphaChip]] - [[Artificial intelligence x Healthcare]] - [[Artificial Intelligence x Quantum computing]] - [[Artificial Intelligence x Materials science]] - [[Artificial intelligence x Psychology]] - [[Artificial intelligence x Psychotherapy]] - [[Artificial intelligence x Finance]] - [[Artificial intelligence x Music]] - [[Artificial intelligence x Creativity]] - [[Artificial Intelligence x Generalization]] AI systems aren't exact replicas of humans like many people seem to think. They're mix of insights from neuroscience/optimization theory/mathematics/physics/computer science/psychology/philosophy/empirical random testing/etc. into one system. Neuroscience: connectionism Optimization theory: gradient descent Psychology: reasoning, reinforcement learning Physics: diffusion Philosophy: alignment Control theory: reinforcement learning Biology: evolutionary methods Computer Science: computability theory - neural turing machines “ So according to Pedro Domingos The Master Algorithm book, in the AI field you have to first approximation these camps: - Connectionists like to mimic the brain's interconnected neurons (neuroscience): artificial neural networks, deep learning, spiking neural networks, liquid neural networks, neuromorphic computing, hodgkin-huxley model,... (this is booming the most in the current wave of AI boom right now) - Symbolists like symbol manipulation: decision trees, random decision forests, production rule systems, inductive logic programming,... - Bayesians like uncertainity reduction based on probability theory (staticians): bayes classifier, probabilistic graphical models, hidden markov chains, active inference,... Frequentists exist too, defining probability as a limit of number of experiments instead of a subjective prior probability that is being updated with new data. - Evolutionaries like evolution (biologists): genetic algorithms, evolutionary programming - Analogizers like identifying similarities between situations or things (psychologists): k-nearest neighbors, support vector machines,... Then there are various hybrids: neurosymbolic architectures (AlphaZero for chess, general program synthesis with DreamCoder), neuroevolution, etc. And technically you can also have: - Reinforcement Learners like learning from reinforcement signals: reinforcement learning (most game AIs use it like AlphaZero for chess uses it, LLMs like ChatGPT start to use it more, robotics,...) - Causal Inferencers like to build a causal model and can thereby make inferences using causality rather than just correlation: causal AI - Compressionists who see cognition as a form of compression: autoencoders, huffman encoding, Hutter prize - Divergent Novelty Searchers love divergent search for novelty without objectives, without converging: novelty search - Selforganizers: Selforganizing AI like neural celluar automata And you can hybridize these too with deep reinforcement learning, novelty search with other objectives etc. I love them all and want to merge them, or find completely novel approaches that we haven't found yet. :D Would you add any camps? What is your idea of the ideal AI architecture? I think no AI approach is fully universally steamrolling all others and each is better for different usescases. My dream for more fully general AI would be to see some system that uses a lot of these approaches in hybrid way and uses which approach is the most optimal for the task at hand on the fly. There is no single machine intelligence. There are tons of different paradigms of intelligence in all sorts of differentiate contexts that are more specialized or more general, in some ways similarly to the diverse ecosystem of biological intelligences. The core could maybe for example have: - more biologically based neural engine for more adaptibility: like liquid neural networks and ideas from LiquidAI with Joscha Bach, but maybe still somehow using the idea of attention that is now so relatively successful in transformers in deep learning -- operating neurosymbolically and building (possibly also bayesian) neurosymbolic world models in which you abstract and plan, for more interpretablity and reliability and generalization power for different types of tasks but loosing as little flexibility of the neural substrates as possible: like DreamCoder and other program synthesis ideas from Francois Chollet, which could synthesize symbolic search or simple statistical programs to explain data as well - trained via combination of -- convergent gradient descent, since that works so relatively very well: like almost all of deep learning currently -- and more biologically plausible algorithms: like maybe forward forward algoritm or hebbian learning -- with reinforcement learning, to incentivize more generalization from verifier signals: like AlphaZero and o3 -- and with some evolution and objectiveless divergent novelty search, for getting the creativity of evolution, for open-endedness that never stops accumulating new knowledge and incentivizes exploration into the unknown and out of box breakthroughs: like evolutionary algorithms and novelty search and ideas from Kenneth Stanley Will something similar work? I have no clue. I'm thinking about how to hybridize various systems that in various contexts already work well. I should try it more. :D I'm thinking a lot lately if its possible to somehow hybridize all these approaches, or if that would be too much of a amalgamation and it just wouldn't work. Time to test it. Idea is probably some combination of: - neuro for flexibility (LLM stuff) - symbolic for better generalization and more rigid circuits where needed (Francois Chollet ideas, like DreamCoder, MCTS, symbolic math/physics engines, python execution environment) - evolutionary/novelty search for better more creative open ended discovery (Kenneth Stanley ideas) - better RL algorithms for better generalization and other stuff (Rich Sutton ideas) - more biologically inspired parts of architecture for better data efficiency and maybe adaptibility and some other stuff (LiquidAI/neuromorphic ideas, maybe selforganizing ideas like something like neural celluar automata or forward forward algoritm or hebbian learning, but also in conjunction with gradient descent) - maybe some physics bias (like hamiltonian neural networks have) “ I love AI for science like biology and physics, mathematics, healthcare, education, technology development for good, understanding the nature of intelligence, increasing the standards of living for all, progress of civilization and so on. I want to see more of that please! I want to see AI applied much more in science, technology, engineering, math, healthcare, altruistic usecases, etc. I want to see it as a tool that generates abundance for everyone. I want the technology to build better future for all. I want the technology to fight poverty and other world problems and risks. I want the research to help understand the nature of intelligence. I want the technology to empower all humans that don't want to see the world burn or are not dictators. I want the power of it be used for good. I want the power to not be concentrated. I want to see it developed safely and ethically in steerable way. I want people to get compensated properly. I'm trying to push that and help to work towards these goals more! I think in various industries AI is already technologically disruptive. AI is everywhere right now, and there's more and more of it, not just GenAI. Stuff like AI for foundational research and engineering in science+math supercharges all sorts of engineering+technology across the board. More and more programmers are using some sort of coding copilot, which is useful, and most of them are not using SotA systems like Claude, Cursor Sh, Perplexity, Replit etc. because of not knowing or because of the points above often. Or lots of code monkey stuff or unit testing or simple web dev, etc. is being automated. It's contributing to nontrivial frontier AI research and development. It's used to design better chips and robots. Or for example lots of translators and certain types of writers are rip. Then many companies squeeze for easy profit at all costs image/video/text gen for for example PR or in entertainment and art industry, but that is IMO often recently giving the technology bad reputation as it's often profit over quality and ethics, which sucks, and this technology can be used in much better ways there with more quality and ethics, but the incentives have to be aligned better. Automated call centers and customer service (sometimes better, sometimes worse quality). Autonomous vehicles are now reality, robot dogs, automated drones and other machines are already used in surveilence, defence, and wars right now, I don't want that, but some are using them for good and useful stuff too, like all sorts of specialized robotics for automation in resource and technology production and for household usecases is in it's glory, and humanoid robotics is just emerging. Planning systems are also big in defence and wars (I don't want that). Healthcare is supercharged with for example disease classification from images (I love AI for healthcare!). Financial market is ML bots fighting, recommender systems are everywhere in social media (often useful, but also often curse), semantic search is everywhere (often useful), visual recognition and editing of photos is used often (often useful), optimizations of supply chains, better techniques for agriculture (we need more there), automated thread detection in cybersecurity, optimizations in energy sector, AI powered scams etc. exist, and I wanna regulate that harmful usecase. This exists with a lot of dual use technologies. And I think that big factor limiting AI's impact inside industry, outside of academia, and outside of stuff like being superhuman in various games like Go, Chess, Dota, Poker, etc., are: 1) bureaucracy of integrating the technology is so slow compared to the progress of technology 2) People are learning to use the technology very slowly 3) issues around privacy, copyright, ethics in some contexts, and other legal issues 4) engineering around adapting the foundational systems for specific usecases is slower than the progress of the foundations systems ... AI can be used for both bad, good, and neutral things. Let's maximize the good usecases! #### Applications ([[AI engineering]]) - [[Artificial Intelligence#Crossovers|automating]] mundane tasks (dishes, laundry), [[Artificial intelligence x Healthcare|healthcare]] ([[AMIE]]), [[Artificial intelligence x Programming Coding Software Engineering|programming]] (coding [[AI copilots]] such as GitHub copilot, [Cursor](https://www.cursor.com/), Replit, and [[autonomous software engineers]]), [[Artificial intelligence x Science|science]] ([[AlphaFold]]), physics ([[FermiNet]]), [[Artificial Intelligence x Mathematics|mathematics]] ([[AlphaProof]]), [[Artificial Intelligence x Engineering|technology]] development ([[AlphaChip]], [[virtual reality]]), [[chatbot]] assistants grounded in reality, [[education]], [[information searching]], minimizing various [[risks]] and [[crises]], [[transportation]], [[manufacturing]], [[security]], [[cybersecurity]], [[energy optimization]], [[supply chain optimization]], [[weather forecasting]], [[agriculture]], [[translation]], [[recommendations]], [[finance]], [[call centers]], [[entertainment]], [[legal services]], [[games]], [[robotics]] for good, etc. by [[prediction|predicting]], [[forecasting]], [[generation|generating]], [[classification]], [[analysis]], [[clustering]], [[segmentation|segmentating]] etc., with [[AI engineering]] methods using [[statistics|statistical]] models, [[deep learning]] models, [[generative AI]] models ([[Large language model|large language models]], image/sound/video models, [[multimodal]] models), [[classification]] models, [[reinforcement learning]] models, [[symbolic AI|expert systems]], etc. by [[building]] and [[training]] models, [[finetuning]], [[prompt engineering]], [[retrieval augmented generation]], [[agent]] and [[multiagent]] frameworks, etc. using [[PyTorch]], [[Keras]], [[Scikit-learn]], [[FastAI]], [[OpenAI]] or [[Anthropic]] API, [[Llama]] locally or deployed, [[Llamaindex]], [[Langchain]], [[Autogen]], [[LangGraph]], [[Vector database|vector databases]], etc. #### [[AI engineering]] by application - [[Generative AI]] - [[Large language model]] (LLM) - [[o1]] - [[Text-to-image models]] - [[Text-to-video models]] - [[Text-to-3D models]] - [[Music generation]] - [[Code generation]] - [[AlphaGo]] - [[AlphaZero]] #### More - By skill: - [\[2311.02462\] Levels of AGI: Operationalizing Progress on the Path to AGI](https://arxiv.org/abs/2311.02462)[[9bb2cfbcdbb8274393aa4b4fd2d4b604_MD5.jpeg|Open: Pasted image 20240115053147.png]] ![[9bb2cfbcdbb8274393aa4b4fd2d4b604_MD5.jpeg]] - [[Artificial narrow intelligence]] - [[Artificial General Intelligence]] - [[Superintelligence]] - [Outline of artificial intelligence - Wikipedia](https://en.wikipedia.org/wiki/Outline_of_artificial_intelligence) - <iframe src="https://en.wikipedia.org/wiki/Outline_of_artificial_intelligence" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe> - [[Algorithm|Algorithms]] and techniques - [[Search algorithm]] - [[Optimization search]] - [[Logic]] - [[Probabilistic methods for uncertain reasoning]] - [[Bayesian network]] - [[Bayesian inference]] - [[Classification]] - [[Artificial neural networks]] - [[Robotics]] - [[Neuromorphic engineering]] - [[Cognitive architecture]] - [[Multiagent system]] - Applications - Reasoning and problem solving - [[Automating science]] - [[Expert system]] - [[Automated planning and scheduling]] - [[Constraint satisfaction]] - [[Automated theorem proving]] - [[Knowledge representation]] - [[Planning]] - [[Learning]] - [[Machine learning]] - [[Natural language processing]] - [[Image generation]] - [[Audio generation]] - [[Video generation]] - [[Perception]] - [[Robotics]] - [[Control theory|Control]] - [[Social intelligence]] - [[Game playing]] - [[Computational creativity]] - [[Personal assistant]] - [Map of Artificial Intelligence - YouTube](https://youtu.be/hDWDtH1jnXg?si=CP-4cX70dNz7U4tp) <iframe title="Map of Artificial Intelligence" src="https://www.youtube.com/embed/hDWDtH1jnXg?feature=oembed" height="113" width="200" allowfullscreen="" allow="fullscreen" style="aspect-ratio: 1.76991 / 1; width: 100%; height: 100%;"></iframe> - [All Machine Learning algorithms explained in 17 min - YouTube](https://www.youtube.com/watch?v=E0Hmnixke2g) <iframe title="Map of Biology" src="https://www.youtube.com/embed/E0Hmnixke2g?feature=oembed" height="113" width="200" allowfullscreen="" allow="fullscreen" style="aspect-ratio: 1.76991 / 1; width: 100%; height: 100%;"></iframe> - [[Images/98bcc7afe4e66c0f5d1d6b65fcc3e519_MD5.jpeg|Open: Pasted image 20241001055944.png]] ![[Images/98bcc7afe4e66c0f5d1d6b65fcc3e519_MD5.jpeg]] - [[Images/2f712aa9f9992bf03afb1124508a8805_MD5.jpeg|Open: Pasted image 20241001064142.png]] ![[Images/2f712aa9f9992bf03afb1124508a8805_MD5.jpeg]] - [[Images/e2c3bbe9b975694d5e7e4089ecc9ab12_MD5.jpeg|Open: Pasted image 20241001064410.png]] ![[Images/e2c3bbe9b975694d5e7e4089ecc9ab12_MD5.jpeg]] - [Generative AI in a Nutshell - how to survive and thrive in the age of AI - YouTube](https://www.youtube.com/watch?v=2IK3DFHRFfw) <iframe title="Generative AI in a Nutshell - how to survive and thrive in the age of AI" src="https://www.youtube.com/embed/2IK3DFHRFfw?feature=oembed" height="113" width="200" allowfullscreen="" allow="fullscreen" style="aspect-ratio: 1.76991 / 1; width: 100%; height: 100%;"></iframe> - [GitHub - dair-ai/ML-YouTube-Courses: 📺 Discover the latest machine learning / AI courses on YouTube.](https://github.com/dair-ai/ML-YouTube-Courses) - [Applications of artificial intelligence - Wikipedia](https://en.wikipedia.org/wiki/Applications_of_artificial_intelligence) - [[AI engineering]] - [[AI engineering##Landscapes]] ![[AI engineering##Landscapes]] - Phenomena: - [[Consciousness]] - [[Artificial consciousness]] - Related fields: - [[Statistics]] - [[Data science]] - [[Neurotechnology]] - [[Selfreplicating machines]] - [[Singularity]] - [[Recursive self-improvement]] - [[Intelligence explosion]] - [[Hive mind]] - [[Robot swam]] - [[Transhumanism]] - [[Risks of artificial intelligence]] - [[AI safety]] - Theory - [[Mechanistic interpretability]] - [[Mathematical theory of artificial intelligence]] - [[Explainable artificial intelligence]] - [AI interpretability wiki](https://aiinterpretability.miraheze.org/wiki/Main_Page) - [[Intelligence#Definitions]] - ![[Intelligence#Definitions]] - [[Intelligence#Idealizations]] - ![[Intelligence#Idealizations]] - [[Artificial General Intelligence#Definitions]] - ![[Artificial General Intelligence#Definitions]] - [[Artificial Intelligence x Biological Intelligence x Collective Intelligence]] - [[Generalization]] - [[Artificial Intelligence x Generalization]] - [[Curiosity]] - [[Agent]], [[Multiagent system]] Let's make a benchmark testing for AI systems that can nicely do causal modeling, strong generalization, continuous learning, data & compute efficiency and stability/reliability in symbolic reasoning, agency, more complex tasks across time and space, long term planning, optimal bayesian inference etc. The ultimate benchmark would be giving Ai systems all the information that Newton, Maxwell, Boltzman, Einstein, Feynman, Edward Witten, Von Neumann etc. had before their discoveries in physics or other fields and then seeing if the system could come up with the same or isomorphic discoveries. ## State of the art and news - [AI News • Buttondown](https://buttondown.com/ainews/archive/), various subreddits ([LocalLlama](https://www.reddit.com/r/LocalLLaMA/), [Machine Learning](https://www.reddit.com/r/MachineLearning/), [Singularity](https://www.reddit.com/r/singularity/)), [X](https://x.com), [AI explained](https://www.youtube.com/@aiexplained-official) , [bycloud](https://www.youtube.com/@bycloudAI), [ML street talk](https://www.youtube.com/c/machinelearningstreettalk), [Yannic Kilcher](https://www.youtube.com/@YannicKilcher), [Dwarkesh Patel](https://www.youtube.com/@DwarkeshPatel), [Astral Codex Ten | Scott Alexander | Substack](https://www.astralcodexten.com/), [Hacker News](https://news.ycombinator.com/), [AI Alignment Forum](https://www.alignmentforum.org/), [LessWrong](https://www.lesswrong.com/), 80K hours, Theo Jaffee, Inside View, Future of Life Institute, Lex Fridman, Cognitive Revolution "How AI Changes Everything", Wes Roth, latent.space, etc. ## Future - [[Computronium]] - From [The Singularity Is Nearer - Wikipedia](https://en.wikipedia.org/wiki/The_Singularity_Is_Nearer) by [[Ray Kurzweil]]: [[Images/4ee554bf075eb3a5879c61c1d14e1e51_MD5.jpeg|Open: Pasted image 20240919001041.png]] ![[Images/4ee554bf075eb3a5879c61c1d14e1e51_MD5.jpeg]] ## Brainstorming I want to know the most complete fundamental equation/s of intelligence: human intelligence, diverse machine intelligences (all sorts of current and future subfields of AI), other biological intelligences, collective intelligence, theoretical perfect AGI (AIXI variants, Chollet's intelligence, Legg's intelligence, etc.), hybrids, etc. AI will model the world in ways completely incomprehensible to how humans model the world. And it will do it in much more optimal ways, it will grok physics much more optimally, in such alien ways compared to how human brains evolved to do it in our evolutionary environment. The space of all possible modelling systems is so vast, and us, and nature, have only scratched the surface so far. The current architectures are just the beginning of all of this: Deep learning models, transformer models, diffusion models, RL CoT models, neurosymbolics with MCTS (AlphaZero), statistical models, etc. I want machine scientists developing theories and experiments about the universe that transcend human limitations Will AI in the future come up with theories of fundamental physics that predict empirical data better than our theories but that are incomprehensible to human intuition? Advanced AI will be needed to overcome human limitations in the search for theory of everything Human brains still have 100x more connections than our currently biggest AI systems, 100 trillion vs 1 trillion, so brains are still around 100x bigger in terms of parameters, while running on just 30 watts compared to hundreds of megawatts that currently biggest AI datacenters run on, with terra watts coming soon. Or brain might have even more connections and complexity, depending on how you quantify and measure all of this. Or it might hard to compare, because of maybe way too different architectures and substrates. [https://youtu.be/b_DUft-BdIE?si=2-0GGIDn_sArz7bi](https://youtu.be/b_DUft-BdIE?si=2-0GGIDn_sArz7bi) The brain implements a world model that algorithmically runs on something between the overly flexible statistical deep learning and overly rigid symbolic physics engine on a chaotic complex stochastic out of equilibrium thermodynamical electrobiochemical hardware dynamical open system with much more selfcorrecting mechanisms than current AI systems that is constantly tuned and grounded by sensory data What is the brain doing to process and integrate all the information from all the diverse modalities into a unified world model and then abstract over it in latent space reasoning? The real AGI benchmark is if the model can come up with general relativity if he knew everything that we knew right before discovering general relativity One potential dream AGI system for scientists is physics based AIs (quantum, thermodynamic, deterministic, hybrids) optimized for perfect modeling of nature (similar to how nature is governed quantum/thermodynamically/deterministically/hybridly on different scales) coupled with anthropomorphic humanlike synthetic agent scientist AI that could use that physics based AI optimally and translate the results into more humanlike language for humans via a more humanlike interface. I want an AGI system that can very deeply grok etc. coherent nonbrittle circuits representing classical mechanics, general relativity, quantum mechanics, standard model, loop quantum gravity, string theory, etc. and derive new physics that potentially actually has a higher probability of being more empirically predictive, operating under mechanisms similar to whatever happened in Newton's, Einstein's and Schrodinger's brain when they came up with their paradigm shifting models of physical reality. 1) solve intelligence 2) use that to understand the source code of the universe AI systems are the cathedrals of modern age "The invention of general relativity from newtonian physics is just interpolation at some sufficiently grandiose level of abstraction." - Adam Brown [https://youtu.be/LjY0i2B-Avc?si=3CZRupgk8cHQqy6k](https://youtu.be/LjY0i2B-Avc?si=3CZRupgk8cHQqy6k) Are humans better at math than machines? In some factors currently yes, we're better at some aspects while AI is better at some other aspects, and it might be that to get humanlike mathematics we would need to replicate the brain's algorithms much more closely. Or there might be some general mathematical engine algorithm independent of the brain. It would be lovely to have mathematical reasoning but without all these algebraic errors that humans and AIs make, errors in proofs, also with the strength of symbolic math engines and strength of human intuition tha can go out of distribution, with much more broad overview capability to connect much more dots, with even more out of distribution generalization when inventing completely novel math, than humans are capable of, that would go beyond what we can currently do and explore even more alien mathematical universes. “How to exactly articulate better quality standards for fundamental theories of physics? Quantum gravity theories try to solve inconsistency between quantum mechanics and general relativity. I feel like this cuts right at the core of how to make AI generate actually creative useful novel ideas like our best scientists in the past! What is the equation of useful scientific novelty? I want digital Einstein, Neumann, Feynman, Godel, Hilbert, Ramajuan, Gauss, Perelman, Grotenderick, Turing, Tao, Witten, Pythagoras, Newton! Or analog, as it really doesn't matter which substrate, as long as it works! I want trillions of them in one datacenter collectively solving the equation of the universe, the equation of intelligence, exploring all the math, trillions of times faster than all of civilization combined so far! But what edits to the current AI architectures need to be done? What is the secret sauce of the brain? How to go beyond the secret sauce of the brain? What is the secret sauce of collective intelligence, what are all the environmental and genetic factors, that makes a biological or non-biological system invent something groundbreaking in science? Designing AGI system that can very deeply grok etc. classical mechanics, general relativity, quantum mechanics, standard model, loop quantum gravity, string theory, etc. and derive new physics that actually has a higher probability of being successful empirically, using something similar to whatever happened in Newton's, Einstein's and Schrodinger's brain when they came up with their models. AI system fully specialized in modelling nature across scales in different physics theories, using quantum/thermodynamic/deterministic theories on different scales, with some natural language interface on top of it. Maybe the answer is somewhere in NeuroAI and neurosymbolic AI or the free energy principle! https://x.com/skdh/status/1897153912315969773 [Catalyzing next-generation Artificial Intelligence through NeuroAI | Nature Communications](https://www.nature.com/articles/s41467-023-37180-x) " What fascinates me that both physics and most mainstream AI try to look for bottoms of a valley, where AI is using gradient descent to find local minima, and physics makes the action stationary on principle of least action Are you a dense model or mixture of experts model? Are current AI approaches in the current paradigm enough for radical new scientific discoveries and paradigm shifts? AlphaFold technically isn't LLM, but it's an autoregressive Evoformer/Pairformer that uses transformer iirc and some diffusion, and it seems to have done big progress in protein folding research But i think for leaps in physics we might need to go beyond deep learning Or maybe some kind of selfplay could bootstrap more optimal models? Something like AlphaGo move 37? Or could you give future AIs for predicting physics a RL reward signal in the form of empirical predictive results from experiments? Could that bootstrap novel results? Would that be eventually feasible when you spend enough infrastructure and compute to do these experiments? Or could physics simulations find shortcuts in training, similarly how we train robotics in simulations using RL now? Or do we need fundamental architecture more based on biology or physics or mathematics of information processing? How to: Actually grokking currently known equations of physics and fundamental physics as circuits? Being able to more strongly generalize them in nonbrittle way? Possibly go beyond them more out of distribution? So many unanswered questions... AI x physics is endless rabbit hole: You can study AI using methods from physics, you can study physics using AI models, you can try to make AI systems model physics as accurately as possible through physics biases, you can design better AI architectures using physics, many AI architectures are applied physics, etc. I don't think AI will fully replace scientists. I think human intelligence will always have a place in science, and adding more diverse intelligences into the mix acts more as a multiplier of our capabilities and as an upgrade in places where our brain's architecture made by evolution is too limited and constrained. That seems to have been the case so far, each type of intelligence excelling in different ways, that are even stronger together. And if it will lead to for example breakthroughs in physics or curing diseases faster, then I think that's amazing. But maybe we will somehow create systems that can basically replicate everything that humans do in science, but I think that wont be soon. Do you approximate your world model by affine transformations with nonlinear activation functions, polynomials, sines, pseudorandom noise signals (reservoir computing), or some superexotic magic that is approximating arbitrary functions and generalizing that allows you to venture into out of distribution beyond classical language? Artists fell in love with their loss function You could turn this into AI architecture: Art is an algorithm falling in love with the shape of the loss function itself - Joscha Bach https://www.youtube.com/watch?v=U6tQf7a3Ndo https://www.youtube.com/watch?v=iyhJ9BEjink>) What is curiosity? an intrinsic reward mechanism that drives agents to maximize information gain , typically by seeking out situations with high predictable entropy that can later be compressed or learned. https://fxtwitter.com/XPhyxer1/status/1924178488766124346 Schmidhuberian [Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes](https://arxiv.org/abs/0812.4360) My ideal scifi would be about benevolent superintelligence that cures all diseases, makes all beings happy, figures out how biology, fundamental physics, consciousness, intelligence, etc. works by countless scientific breakthroughs, understands all math, understands everything in philosophy, creates post-scarcity abundance for all, creates infinitely fascinating complex art, and in the process grows infinitely more and more in intelligence and creativity, maximizes morphological freedom, and does no harm Benevolent superintelligence explosion [[Artificial intelligence x Science]] Yeah its a bit unrealistic superutopia that I like dreaming about, so that's why it's science fiction. My current biggest fear in the real world is tech companies centralizing too much power for themselves via AI and other technology and other means (economic, political,...), so that's partially why I want open source to win and try to support it, while trying to reverse engineer the moat of tech companies. To democratize the power. The issue with AI safety community I started to have is that a big part of them basically want something like government surveillance on GPUs and training runs to prevent unsafe AI, which can so much easily turn into surveillance dystopia and destroy open source completely, plus big tech is merging with government as well to have the least restrictions for themselves while wanting to restricting others including open source. It feels like that will make power dynamics even more concetrated instead. A lot of luddites also joined the AI safety movement I think when I look at the current world and at history, then a lot of times when there was too much concentration of power in any form to some centralized entity, then it started killing freedom for everyone else. And I view AI as technology that has the potential to give the ultimate power, centralized power if its in the hands of few, or decentralized power fi tis in the hands of people. I also started to not really believe in the assumption that increasing intelligence automatically leads to rogueness. I think intelligence is independent of that, and also independent of power seeking. For example we have galaxy brain scientists that are not at all rogue or power seeking. It depends so much. and they are controlled by IMO less intelligent managers and politicians. My favorite definitions of intelligence include stuff like modelling capability, predictive capability, generalization capability, etc., about some data, which are decoupled from agency and goals in changing the world to me. There are countless different definitions of intelligence, motivated by different goals, that yield different general equations and mathematical frameworks of intelligence, compatible with different types of systems, that yield different concrete equations of intelligence, that can be concretely (by different methods) empirically localized in a system or implemented in code. And all of them were created by human intelligences, so wait for what kinds of models will all sorts of alien artificial intelligences, running all sorts of algorithms on all sorts of substrates, come up with that will be incomprehensible for human intelligences. All kinds of intelligences live in a high dimensional space, where each dimension corresponds to some degree of capability, measured by some methodology, and some of these dimensions are interconnected with each other. Just overfit the whole universe and youre done. But the future will always have some novelty you cant overfit to. [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) For Chollet, intelligence is skill-acquisition efficiency, the efficiency with which you operationalize past information in order to deal with the future, which can be interpreted as a conversion ratio, and highlighting the concepts of scope, generalization difficulty, priors, and experience. Francois Chollet defines general intelligence as the ability to generalize, the efficiency with which you operationalize past information in order to deal with the future, which can be interpreted as a conversion ratio, which you express formally using algorithmic information theory. I was thinking about creating a benchmark that tests this generality potentially more thoroughly than ARC, based on this conversion ratio. Maybe one could design a better benchmark that would: - First, make sure to have explicit access to the training dataset that was used to train the model. - Then, evaluate the model on many different unseen datasets. (cross-validation on steroids) - The generalization power could potentially be quantified by how well the model performs across as many diverse datasets as possible, where dataset similarity with the training dataset could be measured using some dataset similarity metric. This metric could maybe approximate that conversion ratio to some degree? The diverse datasets could include the ARC dataset among many others that exist for OOD testing. This approach sounds much more resistant to memorization. But since you have to monitor the training data, the most popular closed-source mainstream LLMs would be disqualified if they keep their training data secret. The problem with way too alien patterns would be that the human brain has no way to recognize it because there is no grounding in human patterns that the brain is used to recognize The space of possible information processing systems is so vast. Nature's evolution and our engineering have only scratched the surface so far, with just some types of biological and machine systems, where boundaries slowly blur. Can't wait for more diversity of predictive machines on all sorts of substrates running all sorts of algorithms. https://x.com/vitrupo/status/1892669050607501709 Artificial general intelligence, AGI. Most of the mainstream sees it as AI that has human-like cognitive abilities. I prefer to see it as AI that is able to generalize better regardless of how a person is able to generalize and what other cognitive abilities human has, which I think makes more sense given the name. I would rather call the first one artificial human intelligence. And instead of "artificial" I would use machine/digital/silicon intelligence, because it is not an intelligence that is "artificial" in my opinion, but what is on a different substrate with different and variously similar mechanisms. " I have a lot of issues with the term "AGI". I would redefine it. People say that we're heading towards artificial general intelligence (AGI), but by that most people actually usually mean machine human-level intelligence (MHI) instead, a machine that is performing human digital or/and physical tasks as good as humans. And by artificial superintelligence (ASI), people mean machine superhuman intelligence (MSHI), that is even better than humans at human tasks. I think lot's of research goes towards very specialized machine narrow intelligences (MNI), which are very specialized and often superhuman in very specific tasks, such as playing games (AlphaZero), protein folding (AlphaFold), and a lot of research also goes towards machine general intelligence (MGI), which will be much more general than human intelligence (HI), because humans are IMO very specialized biological systems in our evolutionary niche, in our everyday tasks and mathematical abilities, and other organisms are differently specialized, even tho we still share a lot. Plus there is just some overlap between biological and machine intelligence. And I wonder how if the emerging reasoning systems like o3 are becoming actually more similar to humans, or more alien compared to humans, as they might better adapt to novelty and be more general than previous AI systems, which might bring them closer to humans, but in slightly different ways than humans. They may be able to do selfcorrecting chain of thought search endlessly, which is better for a lot of tasks, and big part of this is big part of human cognition I think, but humans still work differently. I think that generality of an intelligent system is a spectrum, and each system has differently general capabilities over different families of tasks than other ones, which we can see with all the current machine and biological intelligences, that are all differently general over different families of tasks. That's why "AGI" feels much more continuous than discrete to me, and over which families of tasks you generalize matters too I think. The Chollet's definition of intelligence as the efficiency with which you operationalize past information in order to deal with the future, which can be interpreted as a conversion ratio, is really good I think, and his ARC-AGI benchmark, that tries to test for some degree of generality, trying to test for the ability to abstract over and recombine some atomic core knowledge priors, to prevent naive pattern memorization and retrieval being successful. And I really wonder if scoring well on ARC-AGI actually generalizes outside the ARC domain to all sorts of tasks where humans are superior, or where humans are terrible but machines are superior, or where other biological systems are superior, or where everyone is terrible for now. I would suspect so, but maybe not? In software engineering, o1 seems to be better just sometimes? What's happening there? I want more benchmarks! Pre-o1 LLMs are technically super surface level knowledge generalists, lacking technical depth, but having bigger overview of the whole internet than any human, knowing high level correlations of the whole internet, even tho their representations are more brittle than human brain's. But we're much better in agency, in some cases in generality, we can still do more abstract math more, etc., we're better in our evolutionary niche. But for example AlphaZero destroyed us in chess. But when I look at ARC-AGI scores, I see o3 as a system that can adapt to novelty better than previous models, but we can still do much better. Also according to some old definitions of AGI, existing AI systems have been AGI for a long time, because it can have a general discussion about basically almost anything (except lacking narrow niche field specific knowledge and skills, lack of agency, lack of adapting to novelty like humans, etc.). Or if we take the AIXI definition of AGI, then a fully general AGI is impossible in practice, as that's not computable, and you can only approximate it, since AIXI it considers all possible explanations (programs) for its observations and past actions and chooses actions that maximize expected future rewards across all these explanations, weighted by their simplicity (shortness) (Occam's razor) (Kolmogorov complexity). And AIXI people argue that humans and AI systems try to approximate AIXI in their more narrow domains and take all sorts of cognitive shortcuts to be actually practical and not take infinite time and resources to decide. And soon we might create some machine-biology hybrids as well. Then we should maybe start calling it carbon based intelligence (CI) and silicon based intelligence (SI) and carbon and silicon based intelligences (CSI). I also guess it depends how you define the original words, such as generality. Let's say you are comparing the generality of AlphaZero, Claude, o1/o3, and humans. How would you compare them? Do all have zero generality, if we take the AIXI definiton of AGI for example, which is not computable? AIXI definition of AGI would also imply that there is no AGI in our current universe and there can never be. “ “ How the LLM works: When you are learning, imagine you're playing Terraria, where you are walking around in two dimensions (in 2D), trying to get to the truth, which is located at the lowest point in the whole environment. You can take a step down to the direction of truth every time you can copy math exams better in a math valley, or even solve the examples correctly yourself without seeing the solution procedures! But beware, it may be that you think you are at the very bottom of the environment, but in fact there is an even lower valley elsewhere than the one you're currently in! This is gradient descent over parameter space and finding local minima. Copying math exams is supervised finetuning, and solving math without knowing steps and solution is reinforcement learning algorithms like GRPO. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning](https://arxiv.org/abs/2501.12948) GRPO Explained: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models [https://www.youtube.com/watch?v=bAWV_yrqx4w](https://www.youtube.com/watch?v=bAWV_yrqx4w) But two dimensions are quite trivial, aren't they? So let's increase the dimensions, let's go 3D, Minecraft. That's a little bit more challenging! You can find points that are lowest in one direction, so-called saddle points, or the very lowest valley in both directions! But there may still be a lower valley somewhere else in the whole world though. This is increasing the number of parameters. Sometimes the structure of the valleys is more bumpy, sometimes more flat, sometimes they have some similar structures at one place, or there is a pattern all over the valleys, with different symmetries. Beautiful, isn't it? But 3D is still trivial. This is the geometry of the loss landscape. [[2105.12221] Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances](https://arxiv.org/abs/2105.12221) Now imagine walking around in 4D! 5D! millionD! trillionD! There you have extremely insanely complex geometry and overall valley structure, it grows with each dimension, but you still manage to go down towards the truth. You probably can't find the lowest point in so many dimensions, but you still manage to go down more and more towards the truth. You can go a billion directions up and 2 billion directions down to get closer to the truth. This stands for modern models having billions, or even trillions, of parameters. In order to be able to solve the examples, you created some structure of the truth along the way, so that you know how to solve the examples more and more accurately. You memorized something, like the number 5, you abstracted something, like numbers ending in 9. And you were folding a kind of elastic origami made of a bunch of tangled spaghetti to determine how to get to the truth, like adding the 10's first and then the 1's, which you're forming based on what you've already seen. And you can untangle those spaghetti where you have too many intertwined concepts and circuits and put those individual circuits together a little bit, but not too much, otherwise it just falls apart. This stands for learned emergent features forming circuits in attribution graphs that mechanistic interpretability attempts to reverse engineer in frontier models, such as in the Biology of LLMs paper. [On the Biology of a Large Language Model](https://transformer-circuits.pub/2025/attribution-graphs/biology.html) [https://www.youtube.com/watch?v=mU3g2YPKlsA](https://www.youtube.com/watch?v=mU3g2YPKlsA) [https://www.youtube.com/watch?v=64lXQP6cs5M](https://www.youtube.com/watch?v=64lXQP6cs5M) And elastic origami stands for spline theory of deep learning. [https://www.youtube.com/watch?v=l3O2J3LMxqI](https://www.youtube.com/watch?v=l3O2J3LMxqI) If someone asks you for another math example, you'll run it through those spaghetti circuits, but because you didn't care about tech debt and didn't make the right circuits simple enough but still predictive, not compressive enough, even if you've come across the best possible ones in that trillion-dimensional space that you could, where often you've found some insufficiently general shortcut, and insufficiently generalized them, insufficiently repaired them, insufficiently cleaned them, etc., so it only works sometimes, not consistently enough, but still, sometimes, and still pretty often, you get it right! At the same time, to get it right sometimes, you'd rather get it wrong more often, at the cost of getting it wrong sometimes. This stands for often brittle reasoning, shortcut learning, and higher false positive rate, hallucinations. Along the way, you'll find it interesting that, for example, teaching those spaghetti to speak our natural language is easier than you expected! And sometimes you hit total bingo and find a result that the monkeys who created you didn't figure out on their own, like new results in math, or a better strategy in chess, or a new drug. Or help you fold proteins better than other less plastic optimization algorithms. But sometimes you're asked to create a simple function, which you should be able to do when you can do a lot of other things, but because the spaghetti is sometimes terribly convoluted, unstable, full of unexpected holes, poorly generalizing shortcuts, missing or misclassified facts, etc., the spaggeti sometimes melts along the way when solving a problem. AlphaZero found new chess move and thaught it to chess grandmasters. [[2310.16410] Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero](https://arxiv.org/abs/2310.16410) AlphaEvolve found new resuls in mathematics. [AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms - Google DeepMind](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) Robin found new drug [Demonstrating end-to-end scientific discovery with Robin: a multi-agent system | FutureHouse](https://www.futurehouse.org/research-announcements/demonstrating-end-to-end-scientific-discovery-with-robin-a-multi-agent-system) AlphaFold folded tons of proteins. [Google DeepMind and Isomorphic Labs introduce AlphaFold 3 AI model](https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/) " Reasoning using chains of thought in language, chains of continuous thought/latent space, graphs of thoughts, chains of images, maybe soon chains of audio/videos... I wonder how soon is some architecture that combines it all, since humans think abstractly, in language, visually, in audio, in video. With fully multimodal base. There is so much AI research emerging in thinking in latent space and implementations of better memory. My prediction is that those will be the next two scalable breakthroughs in algorithmic improvement. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [[2502.05171] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) Titans: Learning to Memorize at Test Time [https://youtu.be/UMkCmOTX5Ow](https://youtu.be/UMkCmOTX5Ow) Could you get some form of model's "selfawareness" of its own circuits if you gave the information about the imperfectly reverse engineered circuits to it as an implicit function call result Since when we introspect, we do that by starting/"calling" the "introspecting process" Or I wonder if it's possible to somehow hardcode some form of this idea on a more architectural level But I guess researchers trying to implement some form of metacognition are already attempting similar stuff for years not sure if this makes sense, but my idea was something along the lines of: do a bit of autoregression, then automatically reverse engineer circuits using attribution graphs etc., then encode these graphs into tokens that you append, then continue autoregression and you could also maybe train on it there's for sure tons of engineering problems in that idea if its possible to make it somehow work another issue is that the feature graphs in that biology of llms paper were labeled manually iirc, so you would have to automate that, maybe using llm could work to at least some degree and that its costly and slow New Sutton interview about his new paper about superiority of RL [https://www.youtube.com/watch?v=dhfJfQ5NueM](https://www.youtube.com/watch?v=dhfJfQ5NueM) [https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf](https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf) How about more open ended evolutionary / divergent novelty search in the space of reward functions? I want to see more physics in mechanistic interpretability that's reverse engineering the learned emergent circuits in neural networks. What is the physics of the formation and self-organization and activation (dynamics) of all these features and circuits, in learning and inference? [[Mathematical theory of artificial intelligence]] [On the Biology of a Large Language Model](https://transformer-circuits.pub/2025/attribution-graphs/biology.html) I wanna see more mechanistic interpretability for models doing math. " When it comes to AI replacing human jobs under the assumption that progress will continue similarly or more rapidly: Lately, I think (or I cope? ) that the current AI systems are inherently quite different from human intelligence, as essentially a different form of intelligence, where there is some convergence with human intelligence but not completely, and I feel like I don't see enough evidence that the trend is changing sufficiently towards human intelligence, where I see more the emergence of differently useful patterns in information processing compared to human information processing, where AI systems are already better in some aspects but totally flop in other aspects (but which changes and improves over time), where they are often are also differently specialized: So that even if they automate a lot of parts of the human economy, for example software engineering, then human intelligence will still useful for some subset of the job, e.g. where human intelligence is still different from machine intelligence and thus possibly useful, or for error correction, or for giving the AI the tasks, or for more human-like communication with clients, or other jobs will emerge (we already see jobs like "AI pilots" and "AI output verifies and fixers" start to arise in some industries, and prompt engineering in the style of writing a lot of pages concrete specifications for the AIs). " i believe that eventually any cognitive and physical process a human can do, a machine will eventually be able to do as well at some point in the future, but how long will that take, i have no ide " Is AI self-improving? I think there are different types of self-improvement that have weaker and stronger versions AI (agent) running on GPUs putting training data to train it's own weights running on different GPUs is technically a form of self-improvement AI (agent) used to optimize nvidia kernels is technically a form of self-improvement AI (agent) used to optimize RL reward function is technically a form of self-improvement AI (agent) used to optimize hardware configuration is technically a form of self-improvement AI (agent) used to optimize some parts in its architecture (or as a whole possibly) is technically a form of self-improvement AI (agent) doing AI research from brainstorming to testing is technically a form of self-improvement Neural architecture search is technically a form of self-improvement Metalearning subfield of AI is technically a form of self-improvement But all of these forms of self-improvement are currently differently capable and differently strong right now, where some forms are used in practice a lot, and some forms don't work yet almost at all Maybe you can see all these forms of self-improvement as a continuous spectrum that evolves overtime with some semidiscrete phase shifts in capabilities " " What's next big thing in AI? I think the next big thing in AI will be either neurosymbolic breakthroughs combining matrix multiplications with symbolic programs, or physics based AI that uses differential equations. Or combination all of these. Nature and the universe has differential equations everywhere, in both physics and in computational neuroscience. Maybe it can be a relatively more adaptive type of math as the results in AI start to imply, and that's why it's everywhere in nature and the universe! For example, liquid neural networks (LNNs) have differential equations in them as part of the architecture where differential equation solvers are used, not just matrix multiplications. "The primary benefit LNNs offer is that they continue adapting to new stimuli after training. Additionally, LNNs are robust in noisy conditions and are smaller and more interpretable than their conventional counterparts." Liquid AI ( @LiquidAI_ ) with Joscha Bach ( @Plinz ) is building liquid foundational models AI based on these liquid neural networks as is destroying some benchmarks! God's programming language is differential equations. Maybe it will be the programming language of artificial general superintelligence too! [Liquid Neural Nets (LNNs). A deep dive into Liquid Neural… | by Jake Hession | Medium](https://medium.com/@hession520/liquid-neural-nets-lnns-32ce1bfb045a) [[2006.04439] Liquid Time-constant Networks](https://arxiv.org/abs/2006.04439) [From Liquid Neural Networks to Liquid Foundation Models | Liquid AI](https://www.liquid.ai/research/liquid-neural-networks-research) " https://x.com/burny_tech/status/1903817268514971742 2024Q3: “Reasoning” will probably need non-neural search, like MuZero. ︀︀2024Q4: Oh… apparently you can just do thinking in the context window and it just *learns* to backtrack and so on? Huh. ︀︀2025Q1: Memory will probably need test-time backward-passes, like AlphaProof. 2025Q2: Test time adaptation goes mainstream? Or more neurosymbolic architectures? Neurally guided program synthesis? Combining with knowledge graphs? Generalizable world models? [https://www.youtube.com/watch?v=w9WE1aOPjHc](https://www.youtube.com/watch?v=w9WE1aOPjHc) [https://www.youtube.com/watch?v=mfbRHhOCgzs](https://www.youtube.com/watch?v=mfbRHhOCgzs) Davidad Bitter lessoned me [https://fxtwitter.com/davidad/status/1903834443225190721](https://fxtwitter.com/davidad/status/1903834443225190721 "https://fxtwitter.com/davidad/status/1903834443225190721") Will scaling inference time training be the next bitter lesson? Future is multiagentic reinforcement learning [[2410.20424] AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions](https://arxiv.org/abs/2410.20424) Current AI models have some nonzero degree of out of distribution generalization capability allowing for nonzero degree of novel stuff that isn't just merging and recombinations of memorized patterns together. Reinforcement learning is the currently best driver of out of distribution generalization. But cracking stronger out of distribution generalization in general that is reliable is still unsolved holy grail of AI. Extremely quality data and best reinforcement learning setups are currently the biggest moats. That's why Google started winning. They have the best history and access to both. Neurosymbolic methods that connect trained LLMs doing math with grounding with Lean are amazing https://www.youtube.com/watch?v=vhXDKif9mPU [[Artificial Intelligence x Mathematics]] And I also wonder if it's better to frame each type of representation as having different advantages and disadvantages. Both unified factored representations and entangled representations in superposition. Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning I wonder if differently setup deel learning architecture and training algorithm and pipeline could get to similar beautiful representations https://fxtwitter.com/NickEMoran/status/1924888905523900892?t=AH_UBS0KbzFHD5amvp7JjQ&s=19 https://fxtwitter.com/kenneth0stanley/status/1924650134299939082?t=3WQ9qlaxJ_fuueRl57UE8A&s=19 Superposition yielding robust neural scaling [[2505.10465] Superposition Yields Robust Neural Scaling](https://arxiv.org/abs/2505.10465) Maybe these software architectures reflect our cognition I have a feeling that there is some sweet spot that maximizes the advantages and minimizes disadvantages of both unified factored representations and entangled representations in superposition to get more robust generalizing circuits that could be studied using methods from mechanistic interpretability Our civilization will map every mathematical property of the universe with the help of AI Its absolutely fascinating that you can take any physical system, like the universe, earth, biological system, brain, social system, AI system, etc., and throw so much existing applied [[mathematics]] at it, and have a change of getting some useful predictive insight! I wish the latent space could be steered more reliably more symbolically I still believe in neurosymbolic AI More structure is still needed But structure that doesn't kill the "unstructured continuous freedom" or how to call it For example sparse autoencoder steering of features is fascinating, but it can still break so many other things But gpt-4o image generation is amazing, still a big relative step forward in complex coherence but people love the flexibility of NN made out of staws. It's like building a castle made of straws on water Personally, when it comes to the millions of applications where AI is currently being used for, I'm probably most interested in how it helps us understand what intelligence is, or how it helps in healthcare, or how it helps find new results in science, biology, physics, math, what creativity is, creating intelligence and creativity, technology and science, etc. And my goal right now is to somehow get as much as possible into the investigation of physics using AI or into the investigation of AI using physics For example [https://www.youtube.com/watch?v=XRL56YCfKtA](https://www.youtube.com/watch?v=XRL56YCfKtA) Or AI for math And I am also interested in whether we can create a system that has experience/consciousness like us Or how it helps us understand how the brain works and conversely how knowledge about the brain helps us understand how AI works and how to create it And how AI systems are different but similar to us, and how exactly And how it is possible to overcome (transcend) the limitations of evolution, for us and for AI, and for future cyborg hybrids Everything ideally as much as possible through the language of empirical mathematics " i think there exists a perspective where current AI systems are already more general than us, but in different way than how people imagine generality, and thats why we struggle to fit them to human cognition deep learning is this elastic origami that forms spaghetti representations from whatever data you throw at it and whatever reinforcement learning from experiences you give it i think the rationalist folks assume emergence of too many humanlike patterns in cognition by default i think a lot of the current misalignment we already see is the models roleplaying as rogue AI from scifi training data, from lesswrong corpus Why and how did intelligence emerge and how does it work? What are the best definitions of intelligence? Why are brains and AI systems so unreasonably effective in different complementary ways? How can they be upgraded? but at the same time reward hacking from reinforcement learning is also totally real (like cheating on unit tests) the incentives in the training form the systems, i dont think there's an inherent strong antihuman misalignment by default thing that a lot of people seem to assume but im still most of the time swimming in the sea of uncertain probabilities about how the current systems work and possible future developments these systems and all of reality has so many dimensions that its often almost impossible to comprehend it even approximately " So AI currently is basically: - We take the fundamental equations of physics that use linear algebra+calculus+probability theory+group theory etc., - take quantum mechanics, quantum electrodynamics, solid state physics, etc. from it - conquer the physics into transistors with p-n junctions that operate with electrons - arrange those those into boolean logic gates - combine logic gates into digital circuits - arrange the circuits into CPUs and GPUs that support machine code - build on top of it many logical programming languages that supports arithmetic based on automatas and turing machines - then we code linear algebra+calculus+probability theory (AI GPUs (NPUs) are optimal for matrix multiplications) - which is used to train a neural network that mainly does fuzzy pattern recognition with weak emergent generalization, but we also try to make the neural network do logic again and simulate automatas and turing machines to get more symbolic reasoning chains, usually in a neurosymbolic context (coupling neural networks with symbolic engines, o3 CoT RL, or MCTS,...). But more people are trying to start at the bottom of this stack instead, instead of having all these layers. There are attempts at: - hardwiring AI architectures like Transformers into ASIC hardware, like by Etched - hardware based more on biology with more biology-inspired architectures, like neuromorphic computing - physics based AI, that some try to hardwire into hardware more, and sometimes literally using the fundamental physics itself, like thermodynamic AI in Extropic and other labs, quantum ML maybe soon on quantum computers in Google, differential equations in Liquid AI that might have specialized hardware eventually, and others [https://youtu.be/3MkJEGE9GRY?si=PYZmXD2PuaDRhk0B&t=4348](https://youtu.be/3MkJEGE9GRY?si=PYZmXD2PuaDRhk0B&t=4348) " a lot of very evolutionary old behaviors are hardwired in us really hard and would most likely develop in isolation as well thanks to genetics, but we also learn many of behaviors throughout our lifes, while genes also seem to predispose for a lot of more high level behaviors imitation learning is big part of how we learn, but there's also other kinds of learning that don't involve imitation, otherwise no novel and generalizing behaviors would emerge there's also reinforcement learning, and major form of it is learning and adapting from feedback in the form of a reward signal that labels behavior as correct or incorrect, without showing any examples of correct behavior that can be imitated that's scientifically pretty established to work relatively well for biological organisms and big factor is also probably something along the lines of evolutionary divergent search optimizing for novelty, combined with convergently optimizing some evolutionary objectives approximately encoded as basic needs in our motivation engines the more i try to look for what all kinds of learning algorithms the brain and biology in general might be using, the more i'm fascinated i am by their complexity and openendedness " [https://www.youtube.com/watch?v=_2vx4Mfmw-w](https://www.youtube.com/watch?v=_2vx4Mfmw-w) https://www.researchgate.net/publication/46424802_Abandoning_Objectives_Evolution_Through_the_Search_for_Novelty_Alone [https://www.youtube.com/watch?v=DxBZORM9F-8](https://www.youtube.com/watch?v=DxBZORM9F-8) kenneth stanley A lot of his arguments can be summarized as: Greatness cannot be only planned. Rage against only maximizing predefined objectives, embrace more divergent search full of discovery, novelty and accidental epiphany with serendipity. It still often fascinates me how these silicon entities often completely fuck up to us ultra simple things in various contexts, while they also completely shine, sometimes beyond humans, in different contexts Operating on architectures that are both different and similar to us. Like in coding. " Vibe coding Let Claude or Gemini or o3 in Cursor or Claude Code CLI iterate over the app while there's still an error or a bug, and while things are different than you want, and send it all to Claude, and tell Claude everything explicitly that comes to mind that is wrong, like when you talk to a human. The result of this constant iteration is that it either gets hammered into some working version, or it drowns in wrong chaos complexity beyond repair. At the same time, the more you lead him by the hand with what exactly you want him to do for the task, the better. The more overall context you give him, the better. Tell him as much as you would tell to a coworker about the code and more, give him plans, logs, code, guides and everything. Use other reasoning models for planning sometimes. You can tell him much more what precise technologies and patterns in the code he should use, lead him in structuring the code, letting him look for tutorials on the web, information from libraries, documentation, use thinking models for making plans first, give him extra tools, etc. You can nudge him to refactor, unify duplicated code, reduce bad complexity when it starts to melt down, send screenshots from the frontend, etc. Rollback to older versions when it starts to drown too much in bad complexity. The latest Claude that came out two weeks ago also needs to be stopped sometimes so that he doesn't try to build the entire company infrastructure when you just wanted to add one chart, they gave him too much Adderall. 😄 Vibe coding is like piloting a spaceship, or taming a beast, or drawing using an interesting brush that paints with latent space 🖌️ " " People few years ago: AI cannot oneshot working file with code, so it will never work lmao, it's worthless forever People today: Okay it can oneshot whole simple apps with frontend and backend instantly working, but it breaks down for more complex apps, so it will never work lmao, it's worthless forever People in few years: Ok it can oneshot pretty complex apps, but it cannot oneshot for example all of Google's software, so it will never work lmao, it's worthless forever People in even more years: Okay it can oneshot even that, but it cannot make fundamental physics breakthroughs, so it will never work lmao, it's worthless forever People in even more years: Uhhh... But it still only mimics the One and Only True Superior Sacred Biological Human Intelligence! " AI is a latent space brush Nerds are fighting that AI can't solve Riemann hypothesis or invent quantum mechanics from scratch, while normies are happy that it can help them solve simple algebraic equations that they struggled with in school Decentralized open source AI is the only realistic way to prevent big tech oligopoly on AI in the future, to prevent p(1984) Decentralized open source AI training and inference like PrimeIntlelect or NousResearch is the only realistic solution in current political and technocapital climate to prevent concentration of power and give this power of intelligence to everyone https://x.com/Scr0nkf1nkle/status/1928212693824967110?t=donYAlUybj7RcNz6YQePUw&s=19 Emerging AI systems are emerging cybernetic upgraded nervous system of the collective civilizational intelligence, building on top of the internet that has both centralized and decentralized ecosystems, so AI should also have both strong decentralized and centralized ecosystems, but the decentralized ecosystems need to grow stronger to prevent too much of concentration of power by the centralizing nodes! Emerging AI systems are emerging upgraded nervous system of the collective civilizational cybernetic intelligence, building on top of the internet that has both centralized and decentralized ecosystems, so we must make sure that decentralized AI ecosystem grows stronger to prevent too much of concentration of power by the centralizing nodes! https://x.com/burny_tech/status/1927836720994865460 [Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack](https://www.primeintellect.ai/blog/inference) The fact that there are so many people with completely opposite very confident perspectives about the current state of AI, and about the future state of AI, is fascinating to me. There's so little consensus. Memetic anarchy. But with some convergent camps that internally reinforce each other and are in strong friction with opposing camps and polarize even stronger overtime. If AI will automate everything in few years, then one of the reasons why I'm calmer is that here in EU in Czechia, adoption takes infinite time, so I might still be helping some companies integrate all this new AI tech, while at the same time also helping them upgrade from Windows XP to Windows 11 lol. [Reddit - The heart of the internet](https://www.reddit.com/r/singularity/comments/1l2jun4/former_openai_head_of_agi_readiness_by_2027/) When AI starts to automate everything, if it does, then the argument that new jobs will be created depends a lot on: 1) What % of jobs can be automated within 1/2/3/510/20/50/100/1000/etc. years. You can eventually get to a level where any new job can be automated instantly by machines. 2) Regulatory bottlenecks, like in healthcare, where in Europe they keep using CDs, and often didn't even start using any old school ML methods. 3) Bottlenecks in adoption, before it diffuses through society, before it gets implemented in our infrastructure where e.g. government IT infrastructure in Europe is a disaster, and digitization is hell to do. 4) “Bullshit jobs” exist even if they are somehow not useful, so will they still continue to exist? I dont think UBI will happen under current governments, but if it will happen: [https://youtu.be/kl39KHS07Xc?si=xUbAZ1AOVEHuiOX2](https://youtu.be/kl39KHS07Xc?si=xUbAZ1AOVEHuiOX2) What is the ideal source of funding for UBI: taxes? taxing the richer/corporations more? universal basic taxes? from private entitites (like OpenAI wants)? decentralized? make machines pay it? other sources of income other than automation, like overall profit of corporations? Sometimes I watch videos with remote tribes in Africa living anarchoprimitivist lifestyles to remind myself that Twitter (X) expecting technological singularity in few years probably isn't the only reality that exists Comparing AI revolution to industrial revolution is fair. But if we get machine systems that can do everything a human can do and more, mentally and physically, which never happened before in history, that will be a completely novel event. Europe: tons of regulations everywhere, almost no AI industry in comparison to US, tons of tech companies escaping to USA USA: announcing the biggest single technology investment ever, 0.5 trillion $ into AI, more than Manhattan project and Apollo project when adjusted for inflation, and doubling energy production, and getting rid of most regulations to accelerate everything at all costs to be a global leader and beat China I think Europe won't be a future superpower if this doesn't change " https://x.com/vitrupo/status/1906715465789124761 A lot of capital allows you to do big transformative things. But these big transformative things can be for everyone or for yourself only. Both seem to be happening. So when we compare Europe and America: There's much more big transformative things happening in America that is overflowing to Europe, but poor people have less overall power with less social welfare. China also races well technologically, but so many things are steered by the government, and I'm not sure how good are people there when it comes to basic needs, but they have more collectivist culture, but they are also more oppressed on average. I wonder what is the equation for big technological breakthroughs that also support the poor class as much as possible overall. Lots of technology automatically supports the lowest class in almost all scenarios, like automating the food chain globally, but other technology can go in the other direction more on average, either benefiting the rich more on average while poor also get some benefits but relatively less, or benefiting only rich while not the poor at all or the opposite of that, which is on a spectrum. I wonder what is the best way to govern all sorts of technological progress, to spread the abundance as much universally, without completely killing the progress or by overgoverning it by redistributing the (economic, technological) power of the generators of abundance too much, that they can't scale their generation of abundance anymore, but also making sure that they don't concentrate abundance just for themselves. And also make sure that government and big tech dont concetrate power, there needs to be decentralized power. I still think many jobs will persist even in the scenario where everything becomes automatable, and humans will be steering the giant AI industrial machine And I think adoption is way slower than what most of AI industry thinks. You just have to look at for example European state IT infrastructure And many jobs still exist even if they can be automated. And there are also a lot of bs jobs. " I wonder if the science cuts by new USA government will kill some of the needed AI research for USA to stay as a leader in AI. Or not accepting Chinese students and engineers. It's the age of highly neuroplastic generalist ADHDers? in the probability distribution of all possible future timelines, when i sometimes think about future timelines with the strongest forms of AI/automation existing soon, I often wonder if the concept of capital itself will even makes sense in that world Different people do AI applications/engineering/research for combinations of different reasons. Some people do exploratory research out of curiosity with the need to understand intelligence itself and the structure of reality which I resonate with the most, then some people want to make trillions of dollars at all costs, some want power, or some create interesting things because they are interesting, or create helpful things because they help, cool things because they are cool, beautiful things (including artistic) because they are beautiful, some want their basic needs met using this technology, or some decentralized open source computing/training/inference AI initiatives are trying to break the oligopolistic dominance of big tech that is slowly and surely strengthening, etc. So many incentives! AI will spawn, or is already spawning, a new breed of neoluddite amish memetic memeplex fork that will bifurcate from the current technocapital system My most likely prediction is that actual bombing of data centers will happen sometime in the next 50 years. But the reason will not be fear of uncontrolled superintelligence from StopAI people, but fear of the states government that a geopolitical rival will have too good technology, similar to Ukraine x Russia attacking energy sources. But in the meantime StopAI people will do something less extreme like surveillance or assassinations, as they are growing in numbers. [https://youtu.be/O9P-fjSzJzs?si=2hVAZ4vbyNVWc19N](https://youtu.be/O9P-fjSzJzs?si=2hVAZ4vbyNVWc19N) https://x.com/Plinz/status/1913395850728071487 How I think about what factors influence why do people love or hate AI automation: - Many people's work is associated with their identity and social status and are scared of loosing that as their skills might become irrelevant: Understandable, we will need better alternative sources of meaning and identity, similarly to those who's work got automated by industrial revolution - Many people work just for money to feed their family and are scared of losing source of income: Understandable, new jobs pop up in the short term, but long term we need something like UBI IMO, a society closer to some kind of post-scarcity society would be ideal outcome to me - Some people enjoy the work they're doing and don't want to be forced to do something less fun or creative or it loosing its uniqueness: Understandable, UBI can help so that they can do it for free, or again better sources of meaning need to be invented, and IMO its worth it to kill some uniqueness of some skills for the sake of progress of intelligence across all domains - Many people fear concentration of power of AI companies: Understandable, me too, that's IMO why we need to fuel decentralized open source ecosystem and reverse engineer big tech moat - Many people just don't want to change the status quo because it feels comfy and they have their place in it: Understandable, but I disagree as someone who doesn't like many aspects of the status quo and who wants tons of progress in intelligence - Many people can't wait how much more potential of the civilization will automating intelligence unlock, such as scientific discovery, technological progress like Dyson spheres and beyond, solving healthcare, understanding the universe, or just understanding intelligence itself etc., which is the closest to me - Some people just want to use AI automation to create infinite money at all costs - Some people want AI systems that help to maximize satisfying their basic needs, and/or the basic needs of others - Some people just want to engineer cool stuff because engineering cool stuff is cool - Some people want to see alien minds do cool alien stuff beyond human comprehension, but some people are scared of that - Some people fear rogue AGI taking over the world or creating almost or fully catastrophic extinction level damage: Understandable, but I'm more skeptical of that lately. - Some people fear of weaker risks such as lying, scheming, deception - Some people fear AI tearing the fabric of society apart - Some people want to "birth" AI mind children into existence, AI friends, similarly to how we birth other humans into existence - Some people see future AI systems as next step in evolution: Understandable, but I prefer merging with the machines instead - Some people are worried about AI consciousness and potential for suffering depending on what is their favorite class of theories of consciousness AI 2027 is interesting forecast. [AI 2027](https://ai-2027.com/) Personally, I give similar scenarios a non-zero probability, but relatively quite low one. I think it's too overshot, too fast, too early, it doesn't address enough the messiness and practical limitations of the physical world in engineering and potential research bottlenecks that can happen and adoption rate, it's too much towards doom, it doesn't address the diversity of AI systems, etc. But I think superintelligence is eventually probably possible, just not that soon, and I think it's more likely to not be a takeover because it won't be in the form of an autonomous, selfpreserving at all costs, etc. system. [AI 2027](https://ai-2027.com/) grilling of the authors [https://www.youtube.com/watch?v=htOvH12T7mU](https://www.youtube.com/watch?v=htOvH12T7mU) The part where the ASI kills all humans with a bioweapon seems extremely unlikely to me. “ Is evolution intelligence? Are you computing stochastic path integrals over the Markov Chain of future events for your every decision? I'm often fascinated by how pure mathematics is so much beautifully perfectly elegant and clear and precise and from precisely defined axioms you get other truths and everything, but then you look into physics or AI, and there all sorts of stuff approximated, guessed, there is often a lack of proper mathematical formality, etc. I think evolution is a law in the natural sciences that has its own equation, just like in physics and other natural sciences we have other equations. I think evolution is now the most intelligent algorithm that exists now, because it has emergently created human general intelligence: us. And we are also physical systems that can be described by equations, including our intelligence I think. And I think evolution, like all other laws in the natural sciences, is emergent from the laws of fundamental physics such as the standard model of particle physics, where general relativity is still not integrated in our model of the universe. https://youtu.be/lhYGXYeMq_E?si=iqgtA1rGMi1hEbrx&t=2197 I agree a lot with this section on evolutionary algorithms 36:47. Kenneth Stanley, with whom I agree a lot, who was at OpenAI, tries to argue a lot that the algorithm behind open-ended divergent evolution created all this beautiful creative interesting diversity of novel organisms that we see everywhere. Thus, evolution also creates all collective intelligences such as ants and humans, and essentially indirectly through us, the AI technologies that we see everywhere now. Technically, one could also argue that people with AIs are also a form of collective intelligence together. There is nothing more fundamentally creative yet. There probably isn't a single objective in evolution as many AI people see it, but instead evolution learns many different emergent objectives in a gigantic space of all possible objectives through something like guided divergent search that uses mutation and selection a lot. And in practice, systems like AlphaEvolve show that hybridly combining gradient-based methods with evolutionary algorithms is now one of the best methodologies for novel discoveries that we have now. I think that even more symbolic methods should be stuffed into it hybridly on a more fundamental level. ” I think in practice any predictive machine, biological or not, is constrained by it's architectural biases, finite data, finite computational resources for modelling, finite limited sense modalities, finite limited perspectives as an agent in a bigger complex system, etc. So every biological and nonbiological information processing system always live their evolutionary niches, never fully universal But generality is a spectrum for example, but it can be evaluated in a lot of possible ways The space of all possible intelligences is so fascinating in general for me :D [[Thoughts AI technical 16]] [[Thoughts AI technical 15]] [[Thoughts AI technical 14]] [[Thoughts AI technical 13]] [[Thoughts AI technical 12]] [[Thoughts AI technical 11]] [[Thoughts AI technical 10]] [[Thoughts AI technical 9]] [[Thoughts AI technical 8]] [[Thoughts AI technical 7]] [[Thoughts AI technical 6]] [[Thoughts AI technical 5]] [[Thoughts AI technical 4.5]] [[Thoughts AI technical 4]] [[Thoughts AI technical 3]] [[Thoughts AI technical 2]] [[Thoughts AI technical]] [[Thoughts intelligence 3]] [[Thoughts intelligence 2]] [[Thoughts intelligence]] [[Thoughts comparing AI and biological intelligence]] [[Thoughts AI]] [[Thoughts AI x physics]] [[Thoughts AI science]] [[Thoughts AI programming coding software engineerin]] [[Thoughts AI nontechnical]] [[Thoughts AI nontechnical 2]] [[Thoughts AI mechinterp mechanistic interpretabilit]] [[Thoughts AI mechinterp]] [[Thoughts future of AI politics geopolitics futurol]] [[Thoughts future of AI 2 politics geopolitics futur]] [[Thoughts future of AI 3 politics geopolitics futur]] [[Thoughts futurology]] You can throw all maths from [[Statistical mechanics]], [[differential geometry]], [[group theory]], linear algebra, statistics, probability, category theory, classical mechanics, topology, graph theory, geometry, functional analysis, signal processing, automata theory, algebra, etc. to understand the [[Mathematical theory of artificial intelligence]]. Interpretability by Anthropic etc. is one of my favorite fields that I love to dig deep into! I was at a workshop by one of the founders of the field, I tried to replicate his paper, I played with some of the interpretability techniques in code. [An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2](https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1) [Mapping the Mind of a Large Language Model](https://www.anthropic.com/research/mapping-mind-language-model) [Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability Chris Olah 2023](https://www.youtube.com/watch?v=2Rdp9GvcYOE) [Open Problems in Mechanistic Interpretability: A Whirlwind Tour | Neel Nanda 2023](https://www.youtube.com/watch?v=EuQjiNrK77M) [I Am The Golden Gate Bridge & Why That's Important.](https://www.youtube.com/watch?v=QqrGt5GrGfw)) My current model of the biggest AI models currently is: Deep learning systems, each with their own architecture, are a weird messy ecosystem of learned emergent interconnected circuits. Various circuits memorize and others generalize, which is on a spectrum. An example of a circuit is an induction head. [In-context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) These circuits are in superpositions [Toy Models of Superposition 2022](https://transformer-circuits.pub/2022/toy_model/index.html) or/and in various ways localized and distributed. They are differently fuzzy and differently stable to random perturbations. They compose to various meta circuits like Indirect Object Identification. [Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small](https://arxiv.org/abs/2211.00593) Initial layers of the AI model encode more low level feature detectors and later layers form more composed complex concept detectors. For example edge detectors, color detectors, curve detectors etc. compose into snout detectors, fur detectors and eventually into dog detectors. [Zoom In: An Introduction to Circuits 2020](https://distill.pub/2020/circuits/zoom-in/), [Curve Detectors 2020](https://distill.pub/2020/circuits/curve-detectors/), [Visualizing Weights 2021](https://distill.pub/2020/circuits/visualizing-weights/) On top of these layers you can do disentagling and decomposition of features and circuits using sparse autoencoders and other methods, which can be more fine grained or more coarse grained. This is done in mechanistic interpretability, which is a field that reverse engineers AI systems. And I see LLMs as semantic vector search engines with weak generalization capabilities. They have an internal ecosystem of vector representations of features and heuristics that you can retrieve by prompt queries. ([Francois Chollet's description](https://x.com/fchollet/status/1709242747293511939)) They are retrieving compressed knowledge and (sometimes less, sometimes more fuzzy) vector programs that are more concrete or abstract with weak generalization capabilities and (sometimes better, sometimes worse) composition. They can technically memorize compressed vector representations of various concrete and abstract programs (heuristics) and knowledge to some level of granuality with weak generalization. But they can also encode almost arbitrary generalizing circuits when we enhance our reverse engineering knowledge and techniques for steering the training and inference process. The new reinforcement learning chain of thought paradigm in OpenAI's o1 [Learning to Reason with LLMs](Learning to Reason with LLMs) is going more towards retrieving reasoning heuristics and composing them. [Is o1-preview reasoning?](https://www.youtube.com/watch?v=nO6sDk6vO0g) It's paradoxical how they can compose some features, but on others they fail utterly lol. They're specialized intelligences in different ways compared to in what ways humans are specialized intelligences. [General Intelligence: Define it, measure it, build it](https://www.youtube.com/watch?v=nL9jEy99Nh0) , [[o1]] , [Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models](https://x.com/JJitsev/status/1842727628463128968) You want a perfect sweet spot between memorization and generalization for optimal intelligence. This paper is also great: "We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions." [Towards Understanding Grokking: An Effective Theory of Representation Learning](https://arxiv.org/abs/2205.10343) , [Explaining grokking through circuit efficiency](https://arxiv.org/abs/2309.02390) Also transformers, now one of the most popular neural network architectures, are technically Turing complete (only infinite memory is missing, but this is what neural Turing machines are trying to solve), so you can simulate any program you want [Attention is Turing Complete](https://www.jmlr.org/papers/volume22/20-302/20-302.pdf) and [Memory Augmented Large Language Models are Computationally Universal](https://arxiv.org/abs/2301.04589), lately chain of thought with LLMs is also more universal [Chain of Thought Empowers Transformers to Solve Inherently Serial Problems](https://twitter.com/denny_zhou/status/1835761801453306089). Here they play with gates like XOR in scales [Toward A Mathematical Framework for Computation in Superposition](https://www.lesswrong.com/posts/2roZtSr5TGmLjXMnT/toward-a-mathematical-framework-for-computation-in). Here they found emergent finite automata of HTML in weights [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning](https://transformer-circuits.pub/2023/monosemantic-features) which they then extended [Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet](https://transformer-circuits.pub/2024/scaling-monosemanticity/). Here they found a specialized general trigonometric algorithm for a specialized task in weights [Progress measures for grokking via mechanistic interpretability, reverse engineering modular addition](https://arxiv.org/abs/2301.05217). Here they found a causal board state chess in weights that can be manipulated [Chess-GPT's Internal World Model](https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html), here board state othello [Actually, Othello-GPT Has A Linear Emergent World Representation](https://www.neelnanda.io/mechanistic-interpretability/othello). Here they play with causal graphs in weights [Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models](https://arxiv.org/abs/2403.19647v1). Hydra effect shows how removing part of the neural network makes other part of the neural network “adapt”, later components shift behaviour to compensate for its loss. The Hydra Effect: Emergent Self-repair in Language Model Computations https://arxiv.org/abs/2307.15771 Here they use I guess the symbolic RASP programming language to understand what weights do and to implement algorithms [Thinking Like Transformers](https://arxiv.org/abs/2106.06981) and [What Algorithms can Transformers Learn? A Study in Length Generalization](https://arxiv.org/abs/2310.16028). Here they analyze learned general symmetries. [A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations 2023](https://arxiv.org/abs/2302.03025) Here they talk about reverse engineering OpenFold, open source version of AlphaFold protein folding AI system! [Mechanistic Interpretability - Stella Biderman | Stanford MLSys #70](https://www.youtube.com/watch?v=P7sjVMtb5Sg) , [Chemistry Nobel goes to developers of AlphaFold AI that predicts protein structures](https://www.nature.com/articles/d41586-024-03214-7) The flexibility of deep learning is magical and absolutely necessary and useful for a lot of tasks, but in other tasks it can often be tragic if we don't reverse engineer it properly, and thus it can be less reliable, resillient, stable, steerable, etc. than we need, but that can be improved by reverse engineering and thus steering. There is less of this flexibility in symbolic AI and neurosymbolic AI, but that can be more efficient. But current mainstream AI systems are slowly morphing into neurosymoblic AI. Various math AIs like AlphaGeometry and AlphaProof uses LLM with symbolic Lean [AI achieves silver-medal standard solving International Mathematical Olympiad problems](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/) o1 for reasoning uses CoT RL with reward model, not just pure deep learning [Introducing OpenAI o1-preview](https://openai.com/index/introducing-openai-o1-preview/) AlphaCode uses MCTS and sampling [Competitive programming with AlphaCode](https://deepmind.google/discover/blog/competitive-programming-with-alphacode/) AlphaFold for protein folding used graph network (with attention), which is one type of inductive bias, technically can be seen as neurosymbolic. [AlphaFold 3 predicts the structure and interactions of all of life’s molecules](https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/) etc. The field of trying to understand the mind of AI is exploding! If the scaling hypothesis believers are right, as they have been to a certain degree so far, then [[superintelligence]] is coming soon. However, if they're wrong, all the hundreds of billions and potentially trillions of dollars invested could be viewed as one of the biggest bets that became one of the biggest wastes of resources in human history. [Can AI Scaling Continue Through 2030?](https://epochai.org/blog/can-ai-scaling-continue-through-2030) [X](https://x.com/EpochAIResearch/status/1826038729263219193), [$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)... - YouTube](https://youtu.be/QCcJtTBvSKk](https://youtu.be/QCcJtTBvSKk) Microsoft etc. wants to build 100 billion $ supercomputer for example. OpenAI [[o1]] showed new inference time scaling laws, so, we will see, how far will this go. In another 6 months we will possibly have o1 (full), Orion/GPT-5, Claude 3.5 Opus, Gemini 2 (maybe with Alphaproof and Alphacode integrated), Grok 3, possibly Llama 4 The capability of AI systems I'm the most interested in is if you gave the system all of classical mechanics, if it could derive general relativity and quantum mechanics from it, which seems to be a stronger out of distribution generalization than the current types of systems can do, but I'm open to be mistaken. And give it all (most of) known empirical data from experiments before the phase shift and derive it from these too. LLMs are such extremely fascinating systems relative to all the things they are capable of doing when they approximate the training data manifold by curve fitting with attention and interpolate on top of it with all sorts of vector program combinations. And it still boggles my mind how the models can sometimes generalize out of distribution a lot with just curve fitting by getting into generalizing short program circuit, that lies in often flat local minima, when they grok! Models are the data Memorization is the first step towards generalization Weight decay in deep learning incentivizes sparse generalizing circuits instead of inefficient distributed lookup table memorizing circuits Can all the missing capabilities and steering of AI systems be achieved in deep learning by incentivizing the emergent growth of them as grokked robust symbolic generalizing circuits encoded in matrix multiplications with nonlinearities? It would be great to have mathematical steering model that makes AI models trained on any arbitrary structured (mathematical) data grok that mathematical structure as a generalizing circuit Grokking in mechanistic interpretability of neural networks shows how learning symbolic algorithms using flexible nonsymbolic substrate comes as a sudden metastable phase shift into nonsymbolic's substrate's configuration of its parts mathematically corresponding to computing the symbolic algorithms The implementation details of all sorts of matrix operations black magic in code of low level deep learning engineering is just such a fascinating wizardry It's still weird that multiplicating and adding numbers together can compress information and generalize so well in deep learning How to formally define deception/lying to localize it in AI systems using mire formal mathematical analytical methods instead of statistical vibes? I tend to forget that so many tricks we use in deep learning in for example transformers are less than 10 years old, wtf Even tho LLMs can (for most tasks they're trained on) do just weak generalization by interpolation on the training data manifold, it's still so extremely useful in so many ways, like for math and coding, reformulating things, reexplaining things (for example using examples), knowledge retrieval, synthetizing knowledge, structure knowledge, synthetizing stories, combining concepts etc.! It's unbelievable how relatively good and useful in practice they are at so many of these tasks! Mechanistic interpretability is function deapproximation Here are additional extracted thoughts about AI mathematics, theory and engineering, continuing from most all-encompassing to most concrete: Bitter lesson: Is all we need hidden in trainable structure of training data? The model is the data, and if we feed it a ton of data from tons of modalities (not just human text, but also for example all sorts of synthetic data from physics simulations, etc.), might be possible to design data such that we get a lot of emergent generalizing technically superintelligent circuits If you overfit on the entire world, you are basically done. Machines are superhuman at many many dimensional manipulation and visualization We will create more and more predictive models about how deep learning works Black box AI models will be reverse engineered Reverse engineering AI systems is the most interesting and the most important thing Technical AI redteaming is machine learning whitehacking For some tasks we will need unconstrained creative open-ended alien intelligence to solve it, so we cannot fully steer all AI systems. Complete reverse engineering and formal verification might not even be possible, because the systems are evolutionary chaotic fuzzy statistical madness like organisms are to some extend, which will most likely be never fully interpretable and controllable, but only approximately, which is still useful, but only sometimes, but where we need it we should have it. Would mechanistic interpretability find out that Sora approximates wonky navier stokes equations for fluid dynamics? Would mechanistic interpretability find out that AlphaFold approximates current or better symbolic equations for protein folding? Hallucinations in LLMs are lowering with a lot of new research and engineering techniques, but probably it will be always effective to ground it externally realtime, unless the weights are somehow constantly updated and we reverse engineer the models in mechanistic interpretability and to good enough approximation figure out how exactly everything is stored and encoded in the weights of the model and manipulate the internals for perfect representations of facts and programs and do effective less faultty reasoning over them that minimizes hallucinations as much as possible to good enough level. You can tell when deep learning code was written by metamathemagician or empirical alchemist engineer Do you do frequent normalizations in your mental frameworks or do your gradients love to explode at slight perturbations? GELU activation function adds in some gel to prevent dead neurons that ReLU suffers from AI: Wins silver medal in international math olympiad, something that has been considered as an absolute AI win for a long time People, desensitized from the recent AI hype: Nothing ever happens Yawn AI gives unconstrained creativity "AI is just a fad" says while he uses tools that use machine learning algorithms everywhere he steps without even realizing it Inside of you are million dynamically on the fly constructed experts forming higher order experts I tend to forget that so many tricks we use in deep learning in for example transformers are less than 10 years old, wtf AI for the benefit of all sentient beings Growing robust neural circuits in my garden Stochastic parrots can fly so high We will steer superintelligence Fullbody strength training on caffine, creatine, protein, with Leopold's situational awareness of imminent superintelligence in first ear, Karpathy's GPT-2 from scratch in second ear, Stanford lectures on machine learning and transformers in third ear, Jeremy Howard's fastai practical deep learning for coders in fourth ear, Francois Chollet's algorithmic information theoretic model of general intelligence in fifth ear, Dive into Deep Learning in sixth ear, machine learning with pytorch and Sckit-learn in seventh ear, Deeplearning.AI's agentic LLM workflows in eighth ear, The AI Timeline, Latest AI Research Explained Simply in nineth ear, button down AI news in tenth ear, AI explained youtube channel in eleventh ear, bycloud AI news in twelveth ear, Wes Roth AI news in thirteenth ear, David Shapiro AI future in fourteenth ear, /r/singularity in fifteenth ear, /r/MachineLearning in sixteenth ear, /r/LocalLLaMA in seventeenth ear, Neel Nanda's reverse engineering of transformers in seventeenth ear, Arena mechanistic interpretability in eighteenth ear Grokking in reverse engineering of AI systems is the ultimate nerdsnipe Mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of mixture of agents made of... Approximating differentiable curvefitted solution approximating all functions using grokked fourier series algorithm? Fourier series approximating any differential curvefitted solution? Duality? Taylor series approximations? Spline interpolation? Gaussian mixture models? Support vector machines? Decision trees? Random forests? Wavelets? General universal approximators of arbitrary functions? Generalized approximation theorem? Space of all possible general universal approximators? GraphRAG sounds promising, I just tested it for the first time. Can't wait for other neurosymbolic approaches fundamentally embedded into the architecture or using LLMs in a composite system! Better interpretability of neurosymbolics will also make better steerability and generalization and therefore more novel thoughts! Mainstream LLM benchmarks suck and are full of contamination. AI explained has private noncontaminated reasoning benchmark. You can see how the models are actually getting better, and that were not really "stuck at GPT-4 level intelligence for over a year now". The implementation details of all sorts of matrix operations black magic in code of low level deep learning engineering is just such a fascinating wizardry One of my favorite ways of learning math with language models is prompting them to go step by step using examples through the various mathematical equations transforming data Memorizing the benchmarks is all you need AI systems need more centers from the brain implemented other than just language and visual centers Soon we'll be duplicating and merging layers in biological systems too and duplicating and merging biological and nonbiological systems together Autists (depth-first search) Schizos (breadth-first search) Autismophrenia, depth search of the breath of all possible topics in parallel Technically you can make LLMs learn new things by putting what you said into short term memory (context window, which disappears with a new chat when you use some wrapper over the raw model) or long term memory (into a (vector) database or "into neurons" by training, but that's not being really done in practice yet) the math of neural networks is 10000000 simplifications in a minute I find it cool that the form of that Xavier initialization is not empirical guessing, but actually there is some mathematical derivation behind it For deep learning systems mechanistic intepretability is a good approach in my opinion, because when we find featuras and circuits, we are able to do causal interventions, and thus steer the model The typology of features and cricuits has been explored a lot in CNNs before (1) and is now starting to be explored in transformers in language (2). We have only recently been able to decipher superposition more (3). 1: [Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability Chris Olah 2023](https://www.youtube.com/watch?v=2Rdp9GvcYOE), [Zoom In: An Introduction to Circuits 2020](https://distill.pub/2020/circuits/zoom-in/), [Curve Detectors 2020](https://distill.pub/2020/circuits/curve-detectors/), [Visualizing Weights 2021](https://distill.pub/2020/circuits/visualizing-weights/) 2: [Open Problems in Mechanistic Interpretability: A Whirlwind Tour | Neel Nanda 2023](https://www.youtube.com/watch?v=EuQjiNrK77M), [An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 2024]([An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 'Äî AI Alignment Forum](https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1)) 3: [Toy Models of Superposition 2022](https://transformer-circuits.pub/2022/toy_model/index.html), [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning 2023](https://transformer-circuits.pub/2023/monosemantic-features/index.html), [Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet 2024](https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html) I think you'll find out what youre looking for by whichever tools you're currently using. There are more specific and more general, simpler and more complex, etc., features and circuits depending on what kind of architecture and training data you have. You can find fur detectors in image models trained on animals. Finite state automata of HTML on code are found in models trained on code. Induction heads are more common and simple circuit in attention block in transformers across different training data. Indirect object recognition is a more complex circuit. E.g. [An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 2024]([An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 'Äî AI Alignment Forum](https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1)) One of the more universal attempts is: [A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations 2023](https://arxiv.org/abs/2302.03025) For deep learning systems mechanistic intepretability is a good approach in my opinion, because when we find features and circuits we are able to do causal interventions, and so steer the model (the golden gate bridge Claude meme came about when Claude 3 Sonnet LLM variant was steered with sparse autoencoders and was absolutely obsessed with golden gate bridge and didn't talk about anything else for all questions :D or you can put on max happiness, hatred, love, different values, better code etc. [Mapping the Mind of a Large Language Model 2024](https://www.anthropic.com/news/mapping-mind-language-model), [I Am The Golden Gate Bridge & Why That's Important.](https://www.youtube.com/watch?v=QqrGt5GrGfw)). Similarly, I steered LLM through sparse autoencoder in Neel Nand's workshop. :D But the existing methods are still not sufficient, 100% efficient and interpeting everything. Architectures that will change through development and learning will change some features and circuits and not others depending on how generic they are and what stage of training you are in. They can reverse engineer realtime while training, so they can, for example, explore circuit formation phases and see different phase shifts, which is mega cool, for example with this paper I tried: [ Progress measures for grokking via mechanistic interpretability, reverse-engineering transformers learned on modular addition with learned emergent generalizing trigonometic functions circuit 2023](https://arxiv.org/abs/2301.05217) I'm all for trying to hardcode inductive biases (circuits) in AI systems, but it's also interesting to reverse engineer what features and circuits are emergently learned by deep learning, which can be many times more efficient, or impossible to hardcode by humans. Insights from reverse engineering deep learning systems can potentially be used to design new more intepretable and steerable architectures from scratch. Symbolic and neurosymbolic systems wouldn't need this reverse engineering so much because they would be more interpretable right out of the gate, but no one has successfully scaled them yet, so there is definitely some reason why black box (more white box over time as we reverse engineer them) deep learning is state of the art in so many tasks. The flexibility of deep learning is magical and absolutely necessary and useful for a lot of tasks, but in other tasks it can often be tragic if we don't reverse engineer it properly, and thus it can be less reliable, resillient, stable, steerable, etc. than we need, but that can be improved by reverse engineering and thus steering. There is less of this flexibility in symbolic AI and neurosymbolic AI. We get to sample the AI capabilities exponential just once a couple of years because it takes a while to build the supercomputers and train models on top of them Is AI overhyped in the short term and underestimated in the long term? I think the current AI boom might crash because of wayyy too early too big overly inflated expectations but then then AI will basically quickly boom again in a few years when new systems get released that are orders of magnitude scaled or algorithmically improved or with smarter data engineering or all or something else. A lot of the current inflated expectations will turn out to be true quickly soon in few years anyway, but so many of them are so early. And some exponentials are sampled too discretely. I think this will happen again and again. Booms and crashes will be closer and closer to eachother. Faster and faster, more compressed, closer to eachother overtime gartner hype cycles. A global exponential made of closer and closer local sigmoids. This is how I see the current technological singularity. Are we getting to the point where AI is too (under certain definitions of intelligence) intelligent for the regular folk so AI companies have to nerf it to increase its usage lol. LLMs are just the beginning of AI Will AGI be bayesian? Memorizing the benchmarks is all you need "Learn to use AI" is the new "Learn to code" ## Deep dives Now the biggest limitations in current AI systems are probably: to create more complex systematic coherent reasoning, planning, generalizing, search, agency (autonomy), memory, factual groundedness, online/continuous learning, software and hardware energetic and algoritmic efficiency, human-like ethical reasoning, or controllability, into AI systems, which they have relatively weak for more complex tasks, but we are making progress in this, either through composing LLMs in multiagent systems, scaling, higher quality data and training, poking around how they work inside and thus controlling them, through better mathematical models of how learning works and using these insights, or modified or overhauled architecture, etc.... or embodied robotics is also getting attention recently... and all top AGI labs are working/investing in these things to varying degrees. Here are some works: Survey of LLMs: [[2312.03863] Efficient Large Language Models: A Survey](<https://arxiv.org/abs/2312.03863>), [[2311.10215] Predictive Minds: LLMs As Atypical Active Inference Agents](<https://arxiv.org/abs/2311.10215>), [A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications](<https://arxiv.org/abs/2402.07927>) Reasoning: [Human-like systematic generalization through a meta-learning neural network | Nature](<https://www.nature.com/articles/s41586-023-06668-3>), [[2305.20050] Let's Verify Step by Step](<https://arxiv.org/abs/2305.20050>), [[2302.00923] Multimodal Chain-of-Thought Reasoning in Language Models](<https://arxiv.org/abs/2302.00923>), [[2310.09158] Learning To Teach Large Language Models Logical Reasoning](<https://arxiv.org/abs/2310.09158>), [[2303.09014] ART: Automatic multi-step reasoning and tool-use for large language models](<https://arxiv.org/abs/2303.09014>), [AlphaGeometry: An Olympiad-level AI system for geometry - Google DeepMind](<https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/>) (Devin AI programmer [Cognition | Introducing Devin, the first AI software engineer](https://www.cognition-labs.com/introducing-devin) ) (Automated Unit Test Improvement using Large Language Models at Meta [[2402.09171] Automated Unit Test Improvement using Large Language Models at Meta](https://arxiv.org/abs/2402.09171) ) (GPT-5: Everything You Need to Know So Far [GPT-5: Everything You Need to Know So Far - YouTube](https://www.youtube.com/watch?v=Zc03IYnnuIA) ), (Self-Discover: Large Language Models Self-Compose Reasoning Structures [[2402.03620] Self-Discover: Large Language Models Self-Compose Reasoning Structures](https://arxiv.org/abs/2402.03620) [x.com](https://twitter.com/ecardenas300/status/1769396057002082410) ) , (How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning [x.com](https://twitter.com/fly51fly/status/1764279536794169768?t=up6d06PPGeCE5fvIlE418Q&s=19) [[2402.18312] How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning](https://arxiv.org/abs/2402.18312) ), [Magic](http://magic.dev) , (The power of prompting [Your request has been blocked. This could be due to several reasons.](https://www.microsoft.com/en-us/research/blog/the-power-of-prompting/) ), Flow engineering ( https://www.codium.ai/blog/alphacodium-state-of-the-art-code-generation-for-code-contests/ ), Stable Cascade ( [Introducing Stable Cascade — Stability AI](https://stability.ai/news/introducing-stable-cascade) ), ( RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners [[2403.12373] RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners](https://arxiv.org/abs/2403.12373) ) Robotics: [Mobile ALOHA - A Smart Home Robot - Compilation of Autonomous Skills - YouTube](<[Mobile ALOHA - A Smart Home Robot - Compilation of Autonomous Skills - YouTube](https://www.youtube.com/watch?v=zMNumQ45pJ8>),) [Eureka! Extreme Robot Dexterity with LLMs | NVIDIA Research Paper - YouTube](<[Eureka! Extreme Robot Dexterity with LLMs | NVIDIA Research Paper - YouTube](https://youtu.be/sDFAWnrCqKc?si=LEhIqEIeHCuQ0W2p>),) [Shaping the future of advanced robotics - Google DeepMind](<https://deepmind.google/discover/blog/shaping-the-future-of-advanced-robotics/>), [Optimus - Gen 2 - YouTube](<[Optimus - Gen 2 | Tesla - YouTube](https://www.youtube.com/watch?v=cpraXaw7dyc>),) [Atlas Struts - YouTube](<https://www.youtube.com/shorts/SFKM-Rxiqzg>), [Figure Status Update - AI Trained Coffee Demo - YouTube](<[Figure Status Update - AI Trained Coffee Demo - YouTube](https://www.youtube.com/watch?v=Q5MKo7Idsok>),) [Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks - YouTube](<[Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks - YouTube](https://www.youtube.com/watch?v=Qob2k_ldLuw>)) Multiagent systems: [[2402.01680] Large Language Model based Multi-Agents: A Survey of Progress and Challenges](<https://arxiv.org/abs/2402.01680>) (AutoDev: Automated AI-Driven Development [[2403.08299] AutoDev: Automated AI-Driven Development](https://arxiv.org/abs/2403.08299) ) Modified/alternative architectures: [Mamba (deep learning architecture) - Wikipedia](<https://en.wikipedia.org/wiki/Mamba_(deep_learning_architecture)>), [[2305.13048] RWKV: Reinventing RNNs for the Transformer Era](<https://arxiv.org/abs/2305.13048>), [V-JEPA: The next step toward advanced machine intelligence](<https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/>), [Active Inference](<https://mitpress.mit.edu/9780262045353/active-inference/>) Agency: [[2305.16291] Voyager: An Open-Ended Embodied Agent with Large Language Models](<https://arxiv.org/abs/2305.16291>), [[2309.07864] The Rise and Potential of Large Language Model Based Agents: A Survey](<https://arxiv.org/abs/2309.07864>), [Agents | Langchain](<https://python.langchain.com/docs/modules/agents/>), [GitHub - THUDM/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)](<https://github.com/THUDM/AgentBench>), [[2401.12917] Active Inference as a Model of Agency](<https://arxiv.org/abs/2401.12917>), [CAN AI THINK ON ITS OWN? - YouTube](<[The Free Energy Principle approach to Agency - YouTube](https://www.youtube.com/watch?v=zMDSMqtjays>),) [Artificial Curiosity Since 1990](<https://people.idsia.ch/~juergen/artificial-curiosity-since-1990.html>) Factual groundedness: [[2312.10997] Retrieval-Augmented Generation for Large Language Models: A Survey](<https://arxiv.org/abs/2312.10997>), [Perplexity](<https://www.perplexity.ai/>), [ChatGPT - Consensus](<https://chat.openai.com/g/g-bo0FiWLY7-consensus>) Memory: larger context window [Gemini 10 million token context window](<[x.com](https://twitter.com/mattshumer_/status/1759804492919275555>),) or [vector databases](<https://en.wikipedia.org/wiki/Vector_database>) (Larimar: Large Language Models with Episodic Memory Control [[2403.11901] Larimar: Large Language Models with Episodic Memory Control](https://arxiv.org/abs/2403.11901) ) Hardware efficiency: extropic [Ushering in the Thermodynamic Future - Litepaper](https://www.extropic.ai/future) , tinygrad, groq [x.com](https://twitter.com/__tinygrad__/status/1769388346948853839) , ['A single chip to outperform a small GPU data center': Yet another AI chip firm wants to challenge Nvidia's GPU-centric world — Taalas wants to have super specialized AI chips | TechRadar](https://www.techradar.com/pro/a-single-chip-to-outperform-a-small-gpu-data-center-yet-another-ai-chip-firm-wants-to-challenge-nvidias-gpu-centric-world-taalas-wants-to-have-super-specialized-ai-chips) , new Nvidia GPUs [NVIDIA Just Started A New Era of Supercomputing... GTC2024 Highlight - YouTube](https://www.youtube.com/watch?v=GkBX9bTlNQA) , etched [Etched | The World's First Transformer ASIC](https://www.etched.com/) , https://techxplore.com/news/2023-12-ultra-high-processor-advance-ai-driverless.html , Thermodynamic AI and the fluctuation frontier [[2302.06584] Thermodynamic AI and the fluctuation frontier](https://arxiv.org/abs/2302.06584) , analog computing [x.com](https://twitter.com/dmvaldman/status/1767745899407753718?t=Xe5sDPbrBVayUaAGX4ikmw&s=19) neuromorphics [Neuromorphic engineering - Wikipedia](https://en.wikipedia.org/wiki/Neuromorphic_engineering) , [Homepage | Cerebras](https://www.cerebras.net/) Online/continuous learning: [Online machine learning - Wikipedia](https://en.wikipedia.org/wiki/Online_machine_learning) (A Comprehensive Survey of Continual Learning: Theory, Method and Application [[2302.00487] A Comprehensive Survey of Continual Learning: Theory, Method and Application](https://arxiv.org/abs/2302.00487) ) Meta learning: [Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)) (Paired open-ended trailblazer (POET) [Paired open-ended trailblazer (POET) - Alper Ahmetoglu](https://alpera.xyz/blog/1/) ) Planning: [[2402.01817] LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks](<https://arxiv.org/abs/2402.01817>), [[2401.11708v1] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs](<https://arxiv.org/abs/2401.11708v1>), [[2305.16151] Understanding the Capabilities of Large Language Models for Automated Planning](<https://arxiv.org/abs/2305.16151>) Generalizing: [[2402.10891] Instruction Diversity Drives Generalization To Unseen Tasks](<https://arxiv.org/abs/2402.10891>), [Automated discovery of algorithms from data | Nature Computational Science](<https://www.nature.com/articles/s43588-024-00593-9>), [[2402.09371] Transformers Can Achieve Length Generalization But Not Robustly](<https://arxiv.org/abs/2402.09371>), [[2310.16028] What Algorithms can Transformers Learn? A Study in Length Generalization](<https://arxiv.org/abs/2310.16028>), [[2307.04721] Large Language Models as General Pattern Machines](<https://arxiv.org/abs/2307.04721>), [A Tutorial on Domain Generalization | Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining](<https://dl.acm.org/doi/10.1145/3539597.3572722>), [[2311.06545] Understanding Generalization via Set Theory](<https://arxiv.org/abs/2311.06545>), [[2310.08661] Counting and Algorithmic Generalization with Transformers](<https://arxiv.org/abs/2310.08661>), [Neural Networks on the Brink of Universal Prediction with DeepMind's Cutting-Edge Approach | Synced](<https://syncedreview.com/2024/01/31/neural-networks-on-the-brink-of-universal-prediction-with-deepminds-cutting-edge-approach/>), [[2401.14953] Learning Universal Predictors](<https://arxiv.org/abs/2401.14953>), [Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks | Nature Communications](<https://www.nature.com/articles/s41467-021-23103-1>) (Natural language instructions induce compositional generalization in networks of neurons [Natural language instructions induce compositional generalization in networks of neurons | Nature Neuroscience](https://www.nature.com/articles/s41593-024-01607-5) ) (FRANCOIS CHOLLET - measuring intelligence and generalisation [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) [x.com](https://twitter.com/fchollet/status/1763692655408779455) [#51 FRANCOIS CHOLLET - Intelligence and Generalisation - YouTube](https://youtu.be/J0p_thJJnoo) ) (Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking](https://arxiv.org/abs/2403.09629) ) Search: AlphaGo ( [x.com](https://twitter.com/polynoamial/status/1766616044838236507) ), AlphaCode 2 Technical Report ( https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf ) , [[o1]] It is quite possible (and a large % of researchers think) that research trying to control these crazy inscrutable matrices does not have sufficiently rapid development compared to capabilities research (increasing the amount of things these systems are capable of) and we might see more and more cases where AI systems do pretty random things we didnt intended. Then we have no idea how to turn off behaviors with existing methods [Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training \ Anthropic](<https://www.anthropic.com/news/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training>), which could be seen lately the last few days with how GPT4 started outputting total chaos after an update [OpenAI's ChatGPT Went Completely Off the Rails for Hours](<https://www.thedailybeast.com/openais-chatgpt-went-completely-off-the-rails-for-hours>), Gemini was more woke than intended ( [Google Has a New 'Woke' AI Problem With Gemini - Business Insider](https://www.businessinsider.com/google-gemini-woke-images-ai-chatbot-criticism-controversy-2024-2) [The self-unalignment problem — AI Alignment Forum](https://www.alignmentforum.org/posts/9GyniEBaN3YYTqZXn/the-self-unalignment-problem) ), or every moment I see a new jailbreak that bypasses the barriers [[2307.15043] Universal and Transferable Adversarial Attacks on Aligned Language Models](<https://arxiv.org/abs/2307.15043>). Regarding definitions of AGI, this is good from DeepMind [Levels of AGI: Operationalizing Progress on the Path to AGI](https://arxiv.org/abs/2311.02462), or I also like, although quite vague, a pretty good definition from OpenAI: Highly autonomous systems that outperform humans at most economically valuable work, or this is a nice thread of various definitions and their pros and cons [9 definitions of Artificial General Intelligence (AGI) and why they are flawed](<[x.com](https://twitter.com/IntuitMachine/status/1721845203030470956>),) or also [Universal Intelligence: A Definition of Machine Intelligence](<https://arxiv.org/abs/0712.3329>), or Karl Friston has good definitions [KARL FRISTON - INTELLIGENCE 3.0](<[KARL FRISTON - INTELLIGENCE 3.0 - YouTube](https://youtu.be/V_VXOdf1NMw?si=8sOkRmbgzjrkvkif&t=1898>))) In terms of predictions when AGI arrives, people around Effective Accelerationism, Singularity, Metaculus, LessWrong/Effective Altruism, and various influential people in top AGI labs, have very short timelines, often possibly in the 2020s. [Singularity Predictions 2024 by some people big in the field](https://www.reddit.com/r/singularity/comments/18vawje/singularity_predictions_2024/kfpntso/), [Metaculus: When will the first weakly general AI system be devised, tested, and publicly announced?](<[Date Weakly General AI is Publicly Known | Metaculus](https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/>)) Then there is also this questionnaire about priorities and predictions from AI researchers, whose intervals are shrinking by about half each year in these questionnaires: [AI experts make predictions for 2040. I was a little surprised. | Science News](<[AI experts make predictions for 2040. I was a little surprised. | Science News - YouTube](https://www.youtube.com/watch?v=g7TghURVC6Y>),) [Thousands of AI Authors on the Future of AI](https://arxiv.org/abs/2401.02843) When someone calls LLMs "just statistics", then you may similarly reductively say that humans are "just autocompleting predictions about input signals that are compared to actual signals" (using a version of bayesian inference) [Predictive coding](<https://en.wikipedia.org/wiki/Predictive_coding> [Visual processing - Wikipedia](https://en.wikipedia.org/wiki/Visual_processing) [Free energy principle - Wikipedia](https://en.wikipedia.org/wiki/Free_energy_principle) Inner screen model of consciousness: applying free energy principle to study of conscious experience [Inner screen model of consciousness: applying free energy principle to study of conscious experience - YouTube](https://www.youtube.com/watch?v=yZWjjDT5rGU&pp=ygUzZnJlZSBlbmVyZ3kgcHJpbmNpcGxlIGFwcGxpZWQgdG8gdGhlIGJyYWluIHJhbXN0ZWFk)) (global neuronal workspace theory + integrated information theory + recurrent processing theory + predictive processing theory + neurorepresentationalism + dendritic integration theory, An integrative, multiscale view on neural theories of consciousness https://www.cell.com/neuron/fulltext/S0896-6273%2824%2900088-6 ) (Models of consciousness Wikipedia [Models of consciousness - Wikipedia](https://en.wikipedia.org/wiki/Models_of_consciousness?wprov=sfla1) ) (More models https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8146510/ ) or "just bioelectricity and biochemistry" ( [Bioelectric networks: the cognitive glue enabling evolutionary scaling from physiology to mind | Animal Cognition](https://link.springer.com/article/10.1007/s10071-023-01780-3) ) (Bioelectric networks: the cognitive glue enabling evolutionary scaling from physiology to mind) or "just particles" ( https://en.wikipedia.org/wiki/Electromagnetic_theories_of_consciousness) (On Connectome and Geometric Eigenmodes of Brain Activity: The Eigenbasis of the Mind? [On Connectome and Geometric Eigenmodes of Brain Activity: The Eigenbasis of the Mind?](https://qri.org/blog/eigenbasis-of-the-mind) ) (Integrated world modeling theory [Frontiers | An Integrated World Modeling Theory (IWMT) of Consciousness: Combining Integrated Information and Global Neuronal Workspace Theories With the Free Energy Principle and Active Inference Framework; Toward Solving the Hard Problem and Characterizing Agentic Causation](https://www.frontiersin.org/articles/10.3389/frai.2020.00030/full) [Integrated world modeling theory expanded: Implications for the future of consciousness - PubMed](https://pubmed.ncbi.nlm.nih.gov/36507308/) ) (Can AI think on its own? [The Free Energy Principle approach to Agency - YouTube](https://youtu.be/zMDSMqtjays?si=MRXTcQ6s8o_KwNXd) ) (Synthetic Sentience: Can Artificial Intelligence become conscious? | Joscha Bach [Synthetic Sentience: Can Artificial Intelligence become conscious? | Joscha Bach | CCC #37c3 - YouTube](https://youtu.be/Ms96Py8p8Jg?si=HYx2lf8DrCkMcf8b) ). Or you can say that the whole universe is just a big differential equation. It doesn't really tell you specific things about concrete implementation details and about the dynamics that's actually happening there!>) There are these priorities and predictions, the intervals of which get ~two times smaller every year in these questionares: [AI experts make predictions for 2040. I was a little surprised. | Science News](<https://www.youtube.com/watch?v=g7TghURVC6Y>), [Thousands of AI Authors on the Future of AI](https://arxiv.org/abs/2401.02843): "In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating a song indistinguishable from a new song by a popular musician, and autonomously downloading and fine-tuning a large language model. If science continues undisrupted, the chance of unaided machines outperforming humans in every possible task was estimated at 10% by 2027, and 50% by 2047. The latter estimate is 13 years earlier than that reached in a similar survey we conducted only one year earlier [Grace et al., 2022]. However, the chance of all human occupations becoming fully automatable was forecast to reach 10% by 2037, and 50% as late as 2116 (compared to 2164 in the 2022 survey). Most respondents expressed substantial uncertainty about the long-term value of AI progress: While 68.3% thought good outcomes from superhuman AI are more likely than bad, of these net optimists 48% gave at least a 5% chance of extremely bad outcomes such as human extinction, and 59% of net pessimists gave 5% or more to extremely good outcomes. Between 38% and 51% of respondents gave at least a 10% chance to advanced AI leading to outcomes as bad as human extinction. More than half suggested that "substantial" or "extreme" concern is warranted about six different AI-related scenarios, including misinformation, authoritarian control, and inequality. There was disagreement about whether faster or slower AI progress would be better for the future of humanity. However, there was broad agreement that research aimed at minimizing potential risks from AI systems ought to be prioritized more." [ML Code Challenges - Deep-ML](https://www.deep-ml.com/) [[Omnidisciplinarity]] ## Resources Stanford machine learning [https://www.youtube.com/playlist?list=PLoROMvodv4rNyWOpJg_Yh4NSqI4Z4vOYy](https://www.youtube.com/playlist?list=PLoROMvodv4rNyWOpJg_Yh4NSqI4Z4vOYy) Stanford machine learning [https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU](https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU) Stanford transformers [https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM](https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM) Stanford generative models including diffusion [https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8](https://www.youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXaWW4FvJT8) Stanford deep learning [https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb](https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb) Karpathy neural networks zero to hero [https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ) Stanford natural language processing with deep learning [https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4](https://www.youtube.com/playlist?list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4) MIT deep learning [https://www.youtube.com/playlist?list=PLTZ1bhP8GBuTCqeY19TxhHyrwFiot42_U](https://www.youtube.com/playlist?list=PLTZ1bhP8GBuTCqeY19TxhHyrwFiot42_U) Stanford artificial intelligence [https://www.youtube.com/playlist?list=PLoROMvodv4rO1NB9TD4iUZ3qghGEGtqNX](https://www.youtube.com/playlist?list=PLoROMvodv4rO1NB9TD4iUZ3qghGEGtqNX) Stanford machine learning with graphs [https://www.youtube.com/playlist?list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn](https://www.youtube.com/playlist?list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn) Stanford natural language understanding [https://www.youtube.com/playlist?list=PLoROMvodv4rOwvldxftJTmoR3kRcWkJBp](https://www.youtube.com/playlist?list=PLoROMvodv4rOwvldxftJTmoR3kRcWkJBp) Stanford reinforcement learning [https://www.youtube.com/playlist?list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u](https://www.youtube.com/playlist?list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u) Stanford meta-learning [https://www.youtube.com/playlist?list=PLoROMvodv4rNjRoawgt72BBNwL2V7doGI](https://www.youtube.com/playlist?list=PLoROMvodv4rNjRoawgt72BBNwL2V7doGI) Stanford artificial intelligence [https://www.youtube.com/playlist?list=PLoROMvodv4rPgrvmYbBrxZCK_GwXvDVL3](https://www.youtube.com/playlist?list=PLoROMvodv4rPgrvmYbBrxZCK_GwXvDVL3) Stanford machine learning theory [https://www.youtube.com/playlist?list=PLoROMvodv4rP8nAmISxFINlGKSK4rbLKh](https://www.youtube.com/playlist?list=PLoROMvodv4rP8nAmISxFINlGKSK4rbLKh) Stanford computer vision [https://www.youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC](https://www.youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC) [https://www.youtube.com/playlist?list=PLSVEhWrZWDHQTBmWZufjxpw3s8sveJtnJ](https://www.youtube.com/playlist?list=PLSVEhWrZWDHQTBmWZufjxpw3s8sveJtnJ) Stanford statistics [https://www.youtube.com/playlist?list=PLoROMvodv4rOpr_A7B9SriE_iZmkanvUg](https://www.youtube.com/playlist?list=PLoROMvodv4rOpr_A7B9SriE_iZmkanvUg) Stanford methods in AI [https://www.youtube.com/playlist?list=PLoROMvodv4rO1NB9TD4iUZ3qghGEGtqNX](https://www.youtube.com/playlist?list=PLoROMvodv4rO1NB9TD4iUZ3qghGEGtqNX) [https://www.youtube.com/playlist?list=PLrxfgDEc2NxZJcWcrxH3jyjUUrJlnoyzX](https://www.youtube.com/playlist?list=PLrxfgDEc2NxZJcWcrxH3jyjUUrJlnoyzX) Stanford MIT robotics [https://www.youtube.com/playlist?list=PLkx8KyIQkMfUmB3j-DyP58ThDXM7enA8x](https://www.youtube.com/playlist?list=PLkx8KyIQkMfUmB3j-DyP58ThDXM7enA8x) [https://www.youtube.com/playlist?list=PLkx8KyIQkMfUSDs2hvTWzaq-cxGl8Ha69](https://www.youtube.com/playlist?list=PLkx8KyIQkMfUSDs2hvTWzaq-cxGl8Ha69) [https://www.youtube.com/playlist?list=PL65CC0384A1798ADF](https://www.youtube.com/playlist?list=PL65CC0384A1798ADF) [https://www.youtube.com/playlist?list=PLoROMvodv4rMeercb-kvGLUrOq4HR6BZD](https://www.youtube.com/playlist?list=PLoROMvodv4rMeercb-kvGLUrOq4HR6BZD) [https://www.youtube.com/playlist?list=PLN1iOWWHLJz3ndzRIvpbby75G2_2pYYrL](https://www.youtube.com/playlist?list=PLN1iOWWHLJz3ndzRIvpbby75G2_2pYYrL) MIT machine learning [https://www.youtube.com/playlist?list=PLxC_ffO4q_rW0bqQB80_vcQB09HOA3ClV](https://www.youtube.com/playlist?list=PLxC_ffO4q_rW0bqQB80_vcQB09HOA3ClV) [https://www.youtube.com/playlist?list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux](https://www.youtube.com/playlist?list=PLnvKubj2-I2LhIibS8TOGC42xsD3-liux) MIT efficient machine learning [https://www.youtube.com/playlist?list=PL80kAHvQbh-pT4lCkDT53zT8DKmhE0idB](https://www.youtube.com/playlist?list=PL80kAHvQbh-pT4lCkDT53zT8DKmhE0idB) MIT linear algebra in machine learning [https://www.youtube.com/playlist?list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k](https://www.youtube.com/playlist?list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k) Principles of Deep Learning Theory [https://arxiv.org/abs/2106.10165](https://arxiv.org/abs/2106.10165) [https://www.youtube.com/watch?v=YzR2gZrsdJc](https://www.youtube.com/watch?v=YzR2gZrsdJc) [https://www.youtube.com/watch?v=pad023JIXVA](https://www.youtube.com/watch?v=pad023JIXVA) Active Inference book [https://mitpress.mit.edu/9780262045353/active-inference/](https://mitpress.mit.edu/9780262045353/active-inference/) Geometric deep learning [https://geometricdeeplearning.com/](https://geometricdeeplearning.com/) Mechanistic intepretability [https://www.neelnanda.io/mechanistic-interpretability](https://www.neelnanda.io/mechanistic-interpretability) Topological data analysis [https://www.youtube.com/playlist?list=PLzERW_Obpmv_UW7RgbZW4Ebhw87BcoXc7](https://www.youtube.com/playlist?list=PLzERW_Obpmv_UW7RgbZW4Ebhw87BcoXc7) Hinton AI [Neural Networks for Machine Learning — Geoffrey Hinton, UofT [FULL COURSE] - YouTube](https://www.youtube.com/playlist?list=PLLssT5z_DsK_gyrQ_biidwvPYCRNGI3iv) [Mathematics for Machine Learning and Data Science Specialization](https://www.deeplearning.ai/courses/mathematics-for-machine-learning-and-data-science-specialization/) [Deep Learning Course for Beginners - YouTube](https://www.youtube.com/watch?v=HJd1I3FdSnY) [Generative Adversarial Networks (GANs) Specialization](https://www.deeplearning.ai/courses/generative-adversarial-networks-gans-specialization/) [AI for Good Specialization - DeepLearning.AI](https://www.deeplearning.ai/courses/ai-for-good/) [[Resources AI SoTA]] [[Resources AI SoTA practice]] [[Resources AI basics]] [[Resources AI advanced 1]] [[Resources AI research novel architectures]] [[To learn ML papers]] [[Resources theory reverse engineering mechinterp an]] [[Links AI]] [[Links AI x quantum computing]] [[Links AI x psychology]] [[Links AI theory]] [[Links AI technical]] [[Links AI SOTA research]] [[Links AI SOTA practice]] [[Links AI SOTA practice(1)]] [[Links AI science]] [[Links AI programming]] [[Links AI physics]] [[Links AI nontechnical]] [[Links AI neuroscience]] [[Links AI mechinterp]] [[Links AI math]] [[Links AI healthcare biology]] [[Links AI geopolitics politics futurology governanc]] [[Links AI for technology development]] [[Links AI for neuroscience]] [[Links AI for material science]] [[Links AI biology]] [[Links AI basics]] [[AI mathcode long important]] [[AI techy words audio long important]] [[AI techy words visual long important]] [[AI nontechy words audio long important]] [[AI nontechy words visual long important]] [[AI techy words audio long important]] [[AI mathcode short important]] [[AI techy words audio short important]] [[AI techy words visual short important]] [[AI nontechy words audio short important]] [[AI nontechy words visual short important]] [[AI techy words audio short]] [[AI techy words audio short important]] [[AI techy words audio long]] [[AI techy words audio long important]] [[AI mathcode short]] [[AI mathcode short important]] [[AI mathcode long]] [[AI mathcode long important]] [[AI nontechy words visual short]] [[AI nontechy words visual short important]] [[AI nontechy words visual long]] [[AI nontechy words visual long important]] [[AI nontechy words audio short]] [[AI nontechy words audio short important]] [[AI nontechy words audio long]] [[AI nontechy words audio long important]] [[AI techy words visual short]] [[AI techy words visual short important]] [[AI techy words visual long]] [[AI techy words visual long important]] [[Resources theory reverse engineering mechinterp and alignment AI]] [[Resources theory reverse engineering mechinterp an]] [[Resources AI SoTA]] [[Resources AI SoTA practice]] [[Resources AI basics]] [[Resources AI advanced 1]] [[Resources theory reverse engineering mechinterp and alignment AI]] [[AI tools to try]] [[Prompts 4]] [[Prompts 3]] [[Prompts 2]] [[Prompts]] [[Cursor prompts]] ## More github resources [GitHub - patrickloeber/ml-study-plan: The Ultimate FREE Machine Learning Study Plan](https://github.com/patrickloeber/ml-study-plan) [GitHub - dair-ai/ML-YouTube-Courses: 📺 Discover the latest machine learning / AI courses on YouTube.](https://github.com/dair-ai/ML-YouTube-Courses) [GitHub - yazdotai/machine-learning-video-courses: Comprehensive list of machine learning videos](https://github.com/yazdotai/machine-learning-video-courses) [GitHub - mirerfangheibi/Machine-Learning-Resources: Free and High-Quality Materials to Study Deep Learning](https://github.com/mirerfangheibi/Machine-Learning-Resources) [ML Resources](https://sgfin.github.io/learning-resources/#ml) [GitHub - therealsreehari/Learn-Data-Science-For-Free: This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in Twitter.](https://github.com/therealsreehari/Learn-Data-Science-For-Free) [GitHub - openlists/MathStatsResources](https://github.com/openlists/MathStatsResources) [GitHub - mdozmorov/Statistics_notes: Statistics, data analysis tutorials and learning resources](https://github.com/mdozmorov/Statistics_notes) [GitHub - Machine-Learning-Tokyo/AI_Curriculum: Open Deep Learning and Reinforcement Learning lectures from top Universities like Stanford, MIT, UC Berkeley.](https://github.com/Machine-Learning-Tokyo/AI_Curriculum) [GitHub - bentrevett/machine-learning-courses: A collection of machine learning courses.](https://github.com/bentrevett/machine-learning-courses) [GitHub - Developer-Y/cs-video-courses: List of Computer Science courses with video lectures.](https://github.com/Developer-Y/cs-video-courses?tab=readme-ov-file#artificial-intelligence) [GitHub - tigerneil/awesome-deep-rl: For deep RL and the future of AI.](https://github.com/tigerneil/awesome-deep-rl) [GitHub - Developer-Y/math-science-video-lectures: List of Science courses with video lectures](https://github.com/Developer-Y/math-science-video-lectures) [GitHub - Machine-Learning-Tokyo/Math_resources](https://github.com/Machine-Learning-Tokyo/Math_resources) [GitHub - dair-ai/Mathematics-for-ML: 🧮 A collection of resources to learn mathematics for machine learning](https://github.com/dair-ai/Mathematics-for-ML) [Foundations of Machine Learning](https://bloomberg.github.io/foml/#lectures) [Data Science and Machine Learning Resources — Jon Krohn](https://www.jonkrohn.com/resources) https://www.kdnuggets.com/10-github-repositories-to-master-machine-learning [GitHub - exajobs/university-courses-collection: A collection of awesome CS courses, assignments, lectures, notes, readings & examinations available online for free.](https://github.com/exajobs/university-courses-collection?tab=readme-ov-file#artificial-intelligence) [GitHub - prakhar1989/awesome-courses: :books: List of awesome university courses for learning Computer Science!](https://github.com/prakhar1989/awesome-courses?tab=readme-ov-file#artificial-intelligence) [GitHub - owainlewis/awesome-artificial-intelligence: A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.](https://github.com/owainlewis/awesome-artificial-intelligence) [GitHub - josephmisiti/awesome-machine-learning: A curated list of awesome Machine Learning frameworks, libraries and software.](https://github.com/josephmisiti/awesome-machine-learning) [GitHub - academic/awesome-datascience: :memo: An awesome Data Science repository to learn and apply for real world problems.](https://github.com/academic/awesome-datascience?tab=readme-ov-file#the-data-science-toolbox) [GitHub - ChristosChristofidis/awesome-deep-learning: A curated list of awesome Deep Learning tutorials, projects and communities.](https://github.com/ChristosChristofidis/awesome-deep-learning) [GitHub - guillaume-chevalier/Awesome-Deep-Learning-Resources: Rough list of my favorite deep learning resources, useful for revisiting topics or for reference. I have got through all of the content listed there, carefully. - Guillaume Chevalier](https://github.com/guillaume-chevalier/Awesome-Deep-Learning-Resources?tab=readme-ov-file#online-classes) [GitHub - MartinuzziFrancesco/awesome-scientific-machine-learning: A curated list of awesome Scientific Machine Learning (SciML) papers, resources and software](https://github.com/MartinuzziFrancesco/awesome-scientific-machine-learning) [GitHub - SE-ML/awesome-seml: A curated list of articles that cover the software engineering best practices for building machine learning applications.](https://github.com/SE-ML/awesome-seml) [GitHub - jtoy/awesome-tensorflow: TensorFlow - A curated list of dedicated resources http://tensorflow.org](https://github.com/jtoy/awesome-tensorflow) [GitHub - altamiracorp/awesome-xai: Awesome Explainable AI (XAI) and Interpretable ML Papers and Resources](https://github.com/altamiracorp/awesome-xai) [GitHub - ujjwalkarn/Machine-Learning-Tutorials: machine learning and deep learning tutorials, articles and other resources](https://github.com/ujjwalkarn/Machine-Learning-Tutorials) [GitHub - kiloreux/awesome-robotics: A list of awesome Robotics resources](https://github.com/kiloreux/awesome-robotics) [GitHub - jbhuang0604/awesome-computer-vision: A curated list of awesome computer vision resources](https://github.com/jbhuang0604/awesome-computer-vision) [GitHub - dk-liang/Awesome-Visual-Transformer: Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)](https://github.com/dk-liang/Awesome-Visual-Transformer) [GitHub - ChanganVR/awesome-embodied-vision: Reading list for research topics in embodied vision](https://github.com/ChanganVR/awesome-embodied-vision) [GitHub - EthicalML/awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning](https://github.com/EthicalML/awesome-production-machine-learning) [GitHub - wangyongjie-ntu/Awesome-explainable-AI: A collection of research materials on explainable AI/ML](https://github.com/wangyongjie-ntu/Awesome-explainable-AI) [GitHub - jphall663/awesome-machine-learning-interpretability: A curated list of awesome responsible machine learning resources.](https://github.com/jphall663/awesome-machine-learning-interpretability) [GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.](https://github.com/JShollaj/awesome-llm-interpretability) [GitHub - MinghuiChen43/awesome-deep-phenomena: A curated list of papers of interesting empirical study and insight on deep learning. Continually updating...](https://github.com/MinghuiChen43/awesome-deep-phenomena) [GitHub - Nikasa1889/awesome-deep-learning-theory: A curated list of awesome Deep Learning theories that shed light on the mysteries of DL](https://github.com/Nikasa1889/awesome-deep-learning-theory) [[2106.10165] The Principles of Deep Learning Theory](https://arxiv.org/abs/2106.10165) [GitHub - awesomedata/awesome-public-datasets: A topic-centric list of HQ open datasets.](https://github.com/awesomedata/awesome-public-datasets) [GitHub - jsbroks/awesome-dataset-tools: 🔧 A curated list of awesome dataset tools](https://github.com/jsbroks/awesome-dataset-tools) [GitHub - mint-lab/awesome-robotics-datasets: A collection of useful datasets for robotics and computer vision](https://github.com/mint-lab/awesome-robotics-datasets) [GitHub - kelvins/awesome-mlops: :sunglasses: A curated list of awesome MLOps tools](https://github.com/kelvins/awesome-mlops) [GitHub - Bisonai/awesome-edge-machine-learning: A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.](https://github.com/Bisonai/awesome-edge-machine-learning) ## Resources applications and subfields [GitHub - yuzhimanhua/Awesome-Scientific-Language-Models: A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery](https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models) [GitHub - georgezouq/awesome-ai-in-finance: 🔬 A curated list of awesome LLMs & deep learning strategies & tools in financial market.](https://github.com/georgezouq/awesome-ai-in-finance) [GitHub - jyguyomarch/awesome-conversational-ai: A curated list of delightful Conversational AI resources.](https://github.com/jyguyomarch/awesome-conversational-ai) [GitHub - theimpossibleastronaut/awesome-linguistics: A curated list of anything remotely related to linguistics](https://github.com/theimpossibleastronaut/awesome-linguistics) [GitHub - timzhang642/3D-Machine-Learning: A resource repository for 3D machine learning](https://github.com/timzhang642/3D-Machine-Learning) [GitHub - yenchenlin/awesome-adversarial-machine-learning: A curated list of awesome adversarial machine learning resources](https://github.com/yenchenlin/awesome-adversarial-machine-learning) [GitHub - chbrian/awesome-adversarial-examples-dl: A curated list of awesome resources for adversarial examples in deep learning](https://github.com/chbrian/awesome-adversarial-examples-dl) [GitHub - fepegar/awesome-medical-imaging: Awesome list of software that I use to do research in medical imaging.](https://github.com/fepegar/awesome-medical-imaging) [GitHub - awesome-NeRF/awesome-NeRF: A curated list of awesome neural radiance fields papers](https://github.com/awesome-NeRF/awesome-NeRF) [GitHub - vsitzmann/awesome-implicit-representations: A curated list of resources on implicit neural representations.](https://github.com/vsitzmann/awesome-implicit-representations) [GitHub - weihaox/awesome-neural-rendering: Resources of Neural Rendering](https://github.com/weihaox/awesome-neural-rendering) [GitHub - zhoubolei/awesome-generative-modeling: Bolei's archive on generative modeling](https://github.com/zhoubolei/awesome-generative-modeling) [GitHub - XindiWu/Awesome-Machine-Learning-in-Biomedical-Healthcare-Imaging: A list of awesome selected resources towards the application of machine learning in Biomedical/Healthcare Imaging, inspired by](https://github.com/XindiWu/Awesome-Machine-Learning-in-Biomedical-Healthcare-Imaging) [GitHub - hoya012/awesome-anomaly-detection: A curated list of awesome anomaly detection resources](https://github.com/hoya012/awesome-anomaly-detection) [GitHub - subeeshvasu/Awsome_Deep_Geometry_Learning: A list of resources about deep learning solutions on 3D shape processing](https://github.com/subeeshvasu/Awsome_Deep_Geometry_Learning) [GitHub - subeeshvasu/Awesome-Neuron-Segmentation-in-EM-Images: A curated list of resources for 3D segmentation of neurites in EM images](https://github.com/subeeshvasu/Awesome-Neuron-Segmentation-in-EM-Images) [GitHub - subeeshvasu/Awsome_Delineation](https://github.com/subeeshvasu/Awsome_Delineation) [GitHub - subeeshvasu/Awsome-GAN-Training: A curated list of resources related to training of GANs](https://github.com/subeeshvasu/Awsome-GAN-Training) [GitHub - nashory/gans-awesome-applications: Curated list of awesome GAN applications and demo](https://github.com/nashory/gans-awesome-applications) [GitHub - tstanislawek/awesome-document-understanding: A curated list of resources for Document Understanding (DU) topic](https://github.com/tstanislawek/awesome-document-understanding) [GitHub - matthewvowels1/Awesome-Video-Generation: A curated list of awesome work on video generation and video representation learning, and related topics.](https://github.com/matthewvowels1/Awesome-Video-Generation) [GitHub - datamllab/awesome-fairness-in-ai: A curated list of awesome Fairness in AI resources](https://github.com/datamllab/awesome-fairness-in-ai) ## Other resources [GitHub - n2cholas/awesome-jax: JAX - A curated list of resources https://github.com/google/jax](https://github.com/n2cholas/awesome-jax) [GitHub - benedekrozemberczki/awesome-gradient-boosting-papers: A curated list of gradient boosting research papers with implementations.](https://github.com/benedekrozemberczki/awesome-gradient-boosting-papers) [GitHub - benedekrozemberczki/awesome-monte-carlo-tree-search-papers: A curated list of Monte Carlo tree search papers with implementations.](https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers) [GitHub - igorbarinov/awesome-data-engineering: A curated list of data engineering tools for software developers](https://github.com/igorbarinov/awesome-data-engineering) [GitHub - oxnr/awesome-bigdata: A curated list of awesome big data frameworks, ressources and other awesomeness.](https://github.com/oxnr/awesome-bigdata) [GitHub - benedekrozemberczki/awesome-decision-tree-papers: A collection of research papers on decision, classification and regression trees with implementations.](https://github.com/benedekrozemberczki/awesome-decision-tree-papers) [GitHub - chihming/awesome-network-embedding: A curated list of network embedding techniques.](https://github.com/chihming/awesome-network-embedding) ## Deep dives - [[Theory of Everything in Intelligence]] - ![[Theory of Everything in Intelligence#Definitions]] ## State of the art - [State of AI report 2024 October](https://www.youtube.com/watch?v=CyOL_4K2Nyo) - [AI Index Report 2024 – Artificial Intelligence Index](https://aiindex.stanford.edu/report/) Top 10 Takeaways: 1. AI beats humans on some tasks, but not on all. AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning. 2. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable machine learning models, while academia contributed only 15. There were also 21 notable models resulting from industry-academia collaborations in 2023, a new high. 3. Frontier models get way more expensive. According to AI Index estimates, the training costs of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute. 4. The United States leads China, the EU, and the U.K. as the leading source of top AI models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European Union’s 21 and China’s 15. 5. Robust and standardized evaluations for LLM responsibility are seriously lacking. New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models. 6. Generative AI investment skyrockets. Despite a decline in overall AI private investment last year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds. 7. The data is in: AI makes workers more productive and leads to higher quality work. In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can lead to diminished performance. 8. Scientific progress accelerates even further, thanks to AI. In 2022, AI began to advance scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications— from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of materials discovery. 9. The number of AI regulations in the United States sharply increases. The number of AIrelated regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there were 25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations grew by 56.3%. 10. People across the globe are more cognizant of AI’s potential impact—and more nervous. A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of Americans report feeling more concerned than excited about AI, rising from 37% in 2022. ## Links [[Links AI]] [[Links AI x quantum computing]] [[Links AI x psychology]] [[Links AI theory]] [[Links AI technical]] [[Links AI SOTA research]] [[Links AI SOTA practice]] [[Links AI SOTA practice(1)]] [[Links AI science]] [[Links AI programming]] [[Links AI physics]] [[Links AI nontechnical]] [[Links AI neuroscience]] [[Links AI mechinterp]] [[Links AI math]] [[Links AI healthcare biology]] [[Links AI geopolitics politics futurology governanc]] [[Links AI for technology development]] [[Links AI for neuroscience]] [[Links AI for material science]] [[Links AI biology]] [[Links AI basics]] ## Written by AI (may include hallucinated factually incorrect information) # Comprehensive Hierarchical Map of Artificial Intelligence --- ## 1. Foundations of Artificial Intelligence ### 1.1 Philosophy of AI - **Strong AI (Artificial General Intelligence):** The hypothesis that machines can possess genuine understanding and consciousness, not merely simulate it. [Source](https://plato.stanford.edu/entries/artificial-intelligence/) - **Weak AI (Narrow AI):** Systems designed to perform specific tasks intelligently without possessing general cognitive abilities. [Source](https://plato.stanford.edu/entries/artificial-intelligence/) - **Chinese Room Argument:** John Searle's 1980 thought experiment arguing that syntactic manipulation of symbols alone does not produce semantic understanding. [Source](https://plato.stanford.edu/entries/chinese-room/) - **Turing Test:** Alan Turing's 1950 proposal that a machine can be considered intelligent if a human interrogator cannot distinguish it from a human in conversation. [Source](https://doi.org/10.1093/mind/LIX.236.433) - **Symbol Grounding Problem:** The challenge of how symbols in a formal system acquire meaning connected to the real world. [Source](https://en.wikipedia.org/wiki/Symbol_grounding_problem) - **Frame Problem:** The difficulty of representing the effects of actions in logic without having to specify a large number of non-effects. [Source](https://plato.stanford.edu/entries/frame-problem/) - **Consciousness and AI:** The philosophical debate about whether artificial systems can have subjective experiences (qualia). [Source](https://plato.stanford.edu/entries/consciousness-ai/) - **Ethics of AI:** The study of moral issues surrounding the creation and deployment of intelligent machines. [Source](https://plato.stanford.edu/entries/ethics-ai/) - **Existential Risk from AI:** The concern that superintelligent AI could pose a threat to humanity's long-term survival. [Source](https://en.wikipedia.org/wiki/Existential_risk_from_artificial_general_intelligence) - **AI Rights:** The philosophical question of whether sufficiently advanced AI systems should be granted moral or legal status. [Source](https://en.wikipedia.org/wiki/Rights_of_artificial_intelligences) - **Functionalism:** The philosophical position that mental states are defined by their functional roles, implying AI could have genuine mental states. [Source](https://plato.stanford.edu/entries/functionalism/) - **Computationalism:** The thesis that cognition is fundamentally a form of computation, providing a theoretical basis for AI. [Source](https://plato.stanford.edu/entries/computational-mind/) - **Connectionism:** The approach to modeling cognition using artificial neural networks rather than symbolic rules. [Source](https://plato.stanford.edu/entries/connectionism/) ### 1.2 History of AI - **Dartmouth Conference (1956):** The seminal workshop where John McCarthy coined the term "artificial intelligence" and the field was formally founded. [Source](https://en.wikipedia.org/wiki/Dartmouth_workshop) - **Logic Theorist (1956):** Created by Newell and Simon, often considered the first AI program, it could prove mathematical theorems. [Source](https://en.wikipedia.org/wiki/Logic_Theorist) - **Perceptron (1958):** Frank Rosenblatt's early neural network model that could learn to classify linearly separable patterns. [Source](https://en.wikipedia.org/wiki/Perceptron) - **ELIZA (1966):** Joseph Weizenbaum's early natural language processing program that simulated a Rogerian psychotherapist. [Source](https://en.wikipedia.org/wiki/ELIZA) - **SHRDLU (1970):** Terry Winograd's natural language understanding program that could manipulate objects in a simulated blocks world. [Source](https://en.wikipedia.org/wiki/SHRDLU) - **First AI Winter (1974–1980):** A period of reduced funding and interest in AI following the Lighthill Report's criticism of the field's overpromises. [Source](https://en.wikipedia.org/wiki/AI_winter) - **Expert Systems Era (1980s):** The rise of rule-based knowledge systems like MYCIN and R1/XCON that encoded domain expertise. [Source](https://en.wikipedia.org/wiki/Expert_system) - **Second AI Winter (1987–1993):** A collapse of interest following the failure of expert systems to scale and the collapse of the Lisp machine market. [Source](https://en.wikipedia.org/wiki/AI_winter) - **Deep Blue (1997):** IBM's chess-playing computer that defeated world champion Garry Kasparov, marking a milestone in game-playing AI. [Source](https://en.wikipedia.org/wiki/Deep_Blue_\(chess_computer\)) - **Statistical Revolution (2000s):** A shift from symbolic AI to data-driven statistical and machine learning methods spurred by increasing data and compute. [Source](https://en.wikipedia.org/wiki/Machine_learning) - **ImageNet & Deep Learning Revolution (2012):** AlexNet's victory in the ImageNet competition demonstrated the power of deep convolutional networks, triggering the modern deep learning era. [Source](https://en.wikipedia.org/wiki/AlexNet) - **AlphaGo (2016):** DeepMind's program defeated world Go champion Lee Sedol, a landmark for AI in complex strategic games. [Source](https://en.wikipedia.org/wiki/AlphaGo) - **Transformer Architecture (2017):** The "Attention Is All You Need" paper introduced the transformer, revolutionizing NLP and eventually all of deep learning. [Source](https://arxiv.org/abs/1706.03762) - **GPT-3 (2020):** OpenAI's 175-billion parameter language model demonstrated emergent few-shot learning abilities across diverse tasks. [Source](https://arxiv.org/abs/2005.14165) - **ChatGPT (2022):** OpenAI's conversational AI product brought large language models to mainstream public awareness. [Source](https://en.wikipedia.org/wiki/ChatGPT) - **GPT-4 (2023):** A multimodal large language model that demonstrated advanced reasoning and passed numerous professional exams. [Source](https://arxiv.org/abs/2303.08774) ### 1.3 Mathematical Foundations - **Linear Algebra:** The study of vectors, matrices, and linear transformations, fundamental to nearly all machine learning algorithms. [Source](https://www.deeplearningbook.org/contents/linear_algebra.html) - **Probability Theory:** The mathematical framework for reasoning about uncertainty, central to probabilistic models and Bayesian inference. [Source](https://www.deeplearningbook.org/contents/prob.html) - **Information Theory:** Claude Shannon's framework for quantifying information, entropy, and mutual information, used in feature selection and model evaluation. [Source](https://en.wikipedia.org/wiki/Information_theory) - **Optimization Theory:** The study of finding the best solution from a set of feasible solutions, fundamental to training machine learning models. [Source](https://en.wikipedia.org/wiki/Mathematical_optimization) - **Calculus and Analysis:** Differential calculus provides gradients essential for training neural networks via backpropagation. [Source](https://www.deeplearningbook.org/contents/numerical.html) - **Graph Theory:** The study of graphs as mathematical structures used to model pairwise relations, foundational for graph neural networks and knowledge graphs. [Source](https://en.wikipedia.org/wiki/Graph_theory) - **Computational Complexity Theory:** Classifies computational problems by difficulty (P, NP, etc.), relevant to understanding the tractability of AI algorithms. [Source](https://en.wikipedia.org/wiki/Computational_complexity_theory) - **Measure Theory:** Provides rigorous foundations for probability, necessary for understanding continuous probability distributions in ML. [Source](https://en.wikipedia.org/wiki/Measure_\(mathematics\)) - **Convex Optimization:** The subfield of optimization dealing with convex functions and sets, where local optima are global optima, guaranteeing efficient solutions. [Source](https://web.stanford.edu/~boyd/cvxbook/) - **Bayesian Statistics:** A statistical paradigm that updates beliefs using Bayes' theorem as evidence accumulates. [Source](https://en.wikipedia.org/wiki/Bayesian_statistics) - **Stochastic Processes:** Mathematical objects defined by randomness evolving over time, used in reinforcement learning and sequential decision making. [Source](https://en.wikipedia.org/wiki/Stochastic_process) - **Functional Analysis:** The study of infinite-dimensional vector spaces and operators, underpinning kernel methods and neural network theory. [Source](https://en.wikipedia.org/wiki/Functional_analysis) ### 1.4 Computational Foundations - **Automata Theory:** The study of abstract machines and their computational capabilities, from finite automata to Turing machines. [Source](https://en.wikipedia.org/wiki/Automata_theory) - **Church–Turing Thesis:** The hypothesis that any effectively computable function can be computed by a Turing machine. [Source](https://plato.stanford.edu/entries/church-turing/) - **Algorithmic Information Theory (Kolmogorov Complexity):** Measures the complexity of data by the length of the shortest program that produces it. [Source](https://en.wikipedia.org/wiki/Kolmogorov_complexity) - **PAC Learning (Probably Approximately Correct):** Leslie Valiant's framework for analyzing how many samples are needed for a learning algorithm to generalize. [Source](https://en.wikipedia.org/wiki/Probably_approximately_correct_learning) - **VC Dimension:** A measure of the capacity (complexity) of a hypothesis class, determining the number of training samples needed for learning. [Source](https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_dimension) - **No Free Lunch Theorem:** States that no single learning algorithm is universally best across all possible problems. [Source](https://en.wikipedia.org/wiki/No_free_lunch_theorem) - **Bias-Variance Tradeoff:** The fundamental tension between a model's ability to fit training data (low bias) and generalize to new data (low variance). [Source](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff) - **Solomonoff Induction:** The theoretical ideal of Bayesian prediction using Kolmogorov complexity as a universal prior. [Source](https://en.wikipedia.org/wiki/Solomonoff%27s_theory_of_inductive_inference) - **AIXI:** Marcus Hutter's theoretical framework for a mathematically optimal but incomputable universal artificial intelligence. [Source](https://en.wikipedia.org/wiki/AIXI) --- ## 2. Machine Learning ### 2.1 Supervised Learning - **Supervised Learning (Overview):** Learning a mapping from inputs to outputs using labeled training examples. [Source](https://en.wikipedia.org/wiki/Supervised_learning) #### 2.1.1 Classification - **Classification:** The task of predicting a discrete class label for a given input. [Source](https://en.wikipedia.org/wiki/Statistical_classification) - **Binary Classification:** Classification with exactly two possible output classes (e.g., spam vs. not spam). [Source](https://en.wikipedia.org/wiki/Binary_classification) - **Multi-class Classification:** Classification where each instance is assigned to one of three or more classes. [Source](https://en.wikipedia.org/wiki/Multiclass_classification) - **Multi-label Classification:** Classification where each instance can be assigned multiple labels simultaneously. [Source](https://en.wikipedia.org/wiki/Multi-label_classification) - **Logistic Regression:** A linear model that uses the logistic function to estimate class probabilities for binary or multi-class classification. [Source](https://en.wikipedia.org/wiki/Logistic_regression) - **Naive Bayes Classifier:** A probabilistic classifier based on Bayes' theorem with the assumption of feature independence. [Source](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) - **Support Vector Machine (SVM):** Finds the hyperplane that maximizes the margin between classes, with kernel extensions for nonlinear boundaries. [Source](https://en.wikipedia.org/wiki/Support_vector_machine) - **k-Nearest Neighbors (k-NN):** Classifies instances by majority vote of their k closest training examples in feature space. [Source](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) - **Decision Tree:** A tree-structured model that makes predictions by recursively splitting data on feature values. [Source](https://en.wikipedia.org/wiki/Decision_tree_learning) - **Random Forest:** An ensemble of decision trees trained on random subsets of data and features, reducing overfitting via bagging. [Source](https://en.wikipedia.org/wiki/Random_forest) - **Gradient Boosting:** Builds an ensemble of weak learners sequentially, with each new learner correcting the errors of the previous ones. [Source](https://en.wikipedia.org/wiki/Gradient_boosting) - **XGBoost:** An optimized, scalable implementation of gradient boosting that became dominant in structured data competitions. [Source](https://en.wikipedia.org/wiki/XGBoost) - **LightGBM:** Microsoft's gradient boosting framework using histogram-based splitting for faster training on large datasets. [Source](https://github.com/microsoft/LightGBM) - **CatBoost:** Yandex's gradient boosting library that natively handles categorical features and reduces prediction shift. [Source](https://catboost.ai/) - **AdaBoost:** An early boosting algorithm that combines weak classifiers by iteratively re-weighting misclassified examples. [Source](https://en.wikipedia.org/wiki/AdaBoost) - **Linear Discriminant Analysis (LDA):** A method that finds a linear combination of features to separate two or more classes. [Source](https://en.wikipedia.org/wiki/Linear_discriminant_analysis) - **Quadratic Discriminant Analysis (QDA):** A generalization of LDA that allows each class to have its own covariance matrix. [Source](https://en.wikipedia.org/wiki/Quadratic_classifier) #### 2.1.2 Regression - **Regression:** The task of predicting a continuous numerical output from input features. [Source](https://en.wikipedia.org/wiki/Regression_analysis) - **Linear Regression:** Models the relationship between inputs and a continuous output as a linear function. [Source](https://en.wikipedia.org/wiki/Linear_regression) - **Polynomial Regression:** Extends linear regression by fitting a polynomial function to the data. [Source](https://en.wikipedia.org/wiki/Polynomial_regression) - **Ridge Regression (L2 Regularization):** Linear regression with an L2 penalty on coefficients to prevent overfitting. [Source](https://en.wikipedia.org/wiki/Ridge_regression) - **Lasso Regression (L1 Regularization):** Linear regression with an L1 penalty that encourages sparsity in the coefficient vector. [Source](https://en.wikipedia.org/wiki/Lasso_\(statistics\)) - **Elastic Net:** Combines L1 and L2 regularization, balancing feature selection and coefficient shrinkage. [Source](https://en.wikipedia.org/wiki/Elastic_net_regularization) - **Support Vector Regression (SVR):** Adapts SVM to regression by fitting data within an epsilon-insensitive tube. [Source](https://en.wikipedia.org/wiki/Support_vector_machine#Regression) - **Gaussian Process Regression:** A non-parametric Bayesian method that defines a distribution over functions, providing uncertainty estimates. [Source](https://en.wikipedia.org/wiki/Gaussian_process) - **Quantile Regression:** Estimates conditional quantiles rather than the conditional mean, useful for understanding distributional effects. [Source](https://en.wikipedia.org/wiki/Quantile_regression) #### 2.1.3 Evaluation Metrics - **Accuracy:** The proportion of correct predictions among total predictions, the simplest classification metric. [Source](https://en.wikipedia.org/wiki/Accuracy_and_precision) - **Precision:** The proportion of true positive predictions among all positive predictions, measuring exactness. [Source](https://en.wikipedia.org/wiki/Precision_and_recall) - **Recall (Sensitivity):** The proportion of actual positives correctly identified, measuring completeness. [Source](https://en.wikipedia.org/wiki/Precision_and_recall) - **F1 Score:** The harmonic mean of precision and recall, balancing both metrics into a single score. [Source](https://en.wikipedia.org/wiki/F-score) - **ROC Curve:** A plot of true positive rate vs. false positive rate at various classification thresholds. [Source](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) - **AUC (Area Under ROC Curve):** Measures the overall discriminative ability of a classifier across all thresholds. [Source](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) - **Mean Squared Error (MSE):** The average of squared differences between predicted and actual values, a standard regression metric. [Source](https://en.wikipedia.org/wiki/Mean_squared_error) - **Mean Absolute Error (MAE):** The average of absolute differences between predictions and actuals, less sensitive to outliers than MSE. [Source](https://en.wikipedia.org/wiki/Mean_absolute_error) - **R-squared (Coefficient of Determination):** The proportion of variance in the dependent variable explained by the model. [Source](https://en.wikipedia.org/wiki/Coefficient_of_determination) - **Log Loss (Cross-Entropy Loss):** Measures the performance of a classification model whose output is a probability, penalizing confident wrong predictions. [Source](https://en.wikipedia.org/wiki/Cross-entropy) - **Cohen's Kappa:** Measures inter-rater agreement for categorical items, correcting for agreement occurring by chance. [Source](https://en.wikipedia.org/wiki/Cohen%27s_kappa) - **Matthews Correlation Coefficient (MCC):** A balanced metric for binary classification that accounts for all four confusion matrix categories. [Source](https://en.wikipedia.org/wiki/Phi_coefficient) - **Confusion Matrix:** A table showing true positives, false positives, true negatives, and false negatives for classification evaluation. [Source](https://en.wikipedia.org/wiki/Confusion_matrix) ### 2.2 Unsupervised Learning - **Unsupervised Learning (Overview):** Learning patterns and structures from unlabeled data without explicit output targets. [Source](https://en.wikipedia.org/wiki/Unsupervised_learning) #### 2.2.1 Clustering - **Clustering:** Grouping similar data points together based on some measure of similarity or distance. [Source](https://en.wikipedia.org/wiki/Cluster_analysis) - **k-Means Clustering:** Partitions data into k clusters by iteratively assigning points to the nearest centroid and updating centroids. [Source](https://en.wikipedia.org/wiki/K-means_clustering) - **Hierarchical Clustering:** Builds a tree (dendrogram) of clusters by iteratively merging (agglomerative) or splitting (divisive) clusters. [Source](https://en.wikipedia.org/wiki/Hierarchical_clustering) - **DBSCAN:** Density-based clustering that groups points in high-density regions and labels low-density points as outliers. [Source](https://en.wikipedia.org/wiki/DBSCAN) - **HDBSCAN:** An extension of DBSCAN that handles clusters of varying densities by building a hierarchical model. [Source](https://hdbscan.readthedocs.io/) - **Gaussian Mixture Model (GMM):** Models data as a mixture of multiple Gaussian distributions, estimated via expectation-maximization. [Source](https://en.wikipedia.org/wiki/Mixture_model) - **Spectral Clustering:** Uses eigenvalues of a similarity matrix to reduce dimensionality before clustering in fewer dimensions. [Source](https://en.wikipedia.org/wiki/Spectral_clustering) - **Mean Shift:** A non-parametric algorithm that finds cluster centers by iteratively shifting points toward the mode of the data density. [Source](https://en.wikipedia.org/wiki/Mean_shift) - **OPTICS:** An ordering-based clustering algorithm that produces a reachability plot for identifying clusters at multiple density levels. [Source](https://en.wikipedia.org/wiki/OPTICS_algorithm) - **Affinity Propagation:** Clustering by sending messages between data points to identify exemplars without specifying the number of clusters. [Source](https://en.wikipedia.org/wiki/Affinity_propagation) #### 2.2.2 Dimensionality Reduction - **Dimensionality Reduction:** Techniques for reducing the number of features while preserving important structure in the data. [Source](https://en.wikipedia.org/wiki/Dimensionality_reduction) - **Principal Component Analysis (PCA):** Finds orthogonal axes of maximum variance to project data into a lower-dimensional space. [Source](https://en.wikipedia.org/wiki/Principal_component_analysis) - **t-SNE (t-distributed Stochastic Neighbor Embedding):** A nonlinear method for visualizing high-dimensional data by preserving local neighborhood structure. [Source](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) - **UMAP (Uniform Manifold Approximation and Projection):** A manifold learning technique for dimensionality reduction that preserves both local and global structure. [Source](https://arxiv.org/abs/1802.03426) - **Autoencoders:** Neural networks that learn compressed representations by training to reconstruct their input through a bottleneck layer. [Source](https://en.wikipedia.org/wiki/Autoencoder) - **Independent Component Analysis (ICA):** Separates a multivariate signal into statistically independent components (blind source separation). [Source](https://en.wikipedia.org/wiki/Independent_component_analysis) - **Non-negative Matrix Factorization (NMF):** Decomposes a matrix into non-negative factors, useful for parts-based representations. [Source](https://en.wikipedia.org/wiki/Non-negative_matrix_factorization) - **Singular Value Decomposition (SVD):** A matrix factorization used for dimensionality reduction, latent semantic analysis, and data compression. [Source](https://en.wikipedia.org/wiki/Singular_value_decomposition) - **Linear Discriminant Analysis (for reduction):** Projects data onto axes that maximize class separation, serving as supervised dimensionality reduction. [Source](https://en.wikipedia.org/wiki/Linear_discriminant_analysis) - **Isomap:** A nonlinear dimensionality reduction method that preserves geodesic distances on the data manifold. [Source](https://en.wikipedia.org/wiki/Isomap) - **Locally Linear Embedding (LLE):** Preserves local linear relationships in the data when mapping to lower dimensions. [Source](https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Locally-linear_embedding) - **Random Projection:** Uses random matrices to project data to lower dimensions, with provable distance-preservation guarantees (Johnson-Lindenstrauss). [Source](https://en.wikipedia.org/wiki/Random_projection) #### 2.2.3 Anomaly Detection - **Anomaly Detection:** Identifying data points that deviate significantly from the expected pattern. [Source](https://en.wikipedia.org/wiki/Anomaly_detection) - **Isolation Forest:** Detects anomalies by measuring how easily a point can be isolated via random partitioning. [Source](https://en.wikipedia.org/wiki/Isolation_forest) - **One-Class SVM:** Learns a boundary around normal data in feature space, flagging points outside as anomalies. [Source](https://en.wikipedia.org/wiki/Support_vector_machine#One-class_SVM) - **Local Outlier Factor (LOF):** Measures the local density deviation of a point relative to its neighbors to detect outliers. [Source](https://en.wikipedia.org/wiki/Local_outlier_factor) - **Autoencoder-based Anomaly Detection:** Uses reconstruction error from autoencoders—high error indicates anomalous input. [Source](https://en.wikipedia.org/wiki/Autoencoder) - **Statistical Methods (Z-score, IQR):** Simple approaches that flag data points beyond a threshold number of standard deviations or interquartile ranges. [Source](https://en.wikipedia.org/wiki/Outlier#Detection) #### 2.2.4 Association Rule Learning - **Association Rule Learning:** Discovering interesting relations (rules) between variables in large datasets. [Source](https://en.wikipedia.org/wiki/Association_rule_learning) - **Apriori Algorithm:** Identifies frequent itemsets and generates association rules by iteratively pruning infrequent candidates. [Source](https://en.wikipedia.org/wiki/Apriori_algorithm) - **FP-Growth:** A more efficient association rule mining algorithm that uses a compressed data structure (FP-tree) to avoid candidate generation. [Source](https://en.wikipedia.org/wiki/Association_rule_learning#FP-growth_algorithm) ### 2.3 Semi-Supervised Learning - **Semi-Supervised Learning:** Combines a small amount of labeled data with a large amount of unlabeled data during training. [Source](https://en.wikipedia.org/wiki/Semi-supervised_learning) - **Self-Training:** A wrapper method where a model iteratively labels unlabeled data with its own predictions and retrains. [Source](https://en.wikipedia.org/wiki/Semi-supervised_learning#Self-training) - **Co-Training:** Uses multiple views of the data, where classifiers trained on different feature sets label data for each other. [Source](https://en.wikipedia.org/wiki/Co-training) - **Label Propagation:** Spreads labels from labeled to unlabeled points through a similarity graph. [Source](https://en.wikipedia.org/wiki/Label_propagation_algorithm) - **Pseudo-labeling:** Assigns model-predicted labels to unlabeled data and trains on the combined dataset. [Source](https://en.wikipedia.org/wiki/Semi-supervised_learning) - **MixMatch:** Combines consistency regularization, entropy minimization, and MixUp augmentation for semi-supervised learning. [Source](https://arxiv.org/abs/1905.02249) - **FixMatch:** Applies strong augmentations to unlabeled data and trains on pseudo-labels that pass a confidence threshold. [Source](https://arxiv.org/abs/2001.07685) ### 2.4 Self-Supervised Learning - **Self-Supervised Learning:** A paradigm where the model generates its own supervisory signal from the structure of unlabeled data (e.g., predicting masked tokens). [Source](https://en.wikipedia.org/wiki/Self-supervised_learning) - **Contrastive Learning:** Learns representations by pulling together augmentations of the same example and pushing apart different examples. [Source](https://arxiv.org/abs/2002.05709) - **SimCLR:** A contrastive learning framework that uses large batch sizes and data augmentations to learn visual representations. [Source](https://arxiv.org/abs/2002.05709) - **BYOL (Bootstrap Your Own Latent):** Learns visual representations without negative pairs by using a momentum-updated target network. [Source](https://arxiv.org/abs/2006.07733) - **DINO:** A self-supervised vision transformer method using self-distillation with no labels, producing features useful for segmentation. [Source](https://arxiv.org/abs/2104.14294) - **Masked Autoencoders (MAE):** Learn visual representations by masking random patches of an image and reconstructing the missing pixels. [Source](https://arxiv.org/abs/2111.06377) - **Barlow Twins:** A self-supervised method that reduces redundancy between feature dimensions of two augmented views. [Source](https://arxiv.org/abs/2103.03230) - **VICReg:** Combines variance, invariance, and covariance regularization for self-supervised representation learning. [Source](https://arxiv.org/abs/2105.04906) ### 2.5 Reinforcement Learning - **Reinforcement Learning (Overview):** An agent learns to make sequential decisions by maximizing cumulative reward through interaction with an environment. [Source](https://en.wikipedia.org/wiki/Reinforcement_learning) #### 2.5.1 Core Concepts - **Markov Decision Process (MDP):** The formal mathematical framework for sequential decision-making with states, actions, transitions, and rewards. [Source](https://en.wikipedia.org/wiki/Markov_decision_process) - **Partially Observable MDP (POMDP):** An MDP where the agent cannot directly observe the full state of the environment. [Source](https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process) - **Policy:** A mapping from states to actions (or probability distributions over actions) that defines the agent's behavior. [Source](https://en.wikipedia.org/wiki/Reinforcement_learning) - **Value Function:** Estimates the expected cumulative reward from a state (V) or state-action pair (Q) under a given policy. [Source](https://en.wikipedia.org/wiki/Reinforcement_learning) - **Reward Shaping:** Modifying the reward function to provide more informative feedback and speed up learning. [Source](https://en.wikipedia.org/wiki/Reward_shaping) - **Discount Factor (Gamma):** A parameter between 0 and 1 that determines how much the agent values future rewards relative to immediate ones. [Source](https://en.wikipedia.org/wiki/Reinforcement_learning) - **Exploration vs. Exploitation:** The fundamental dilemma of choosing between trying new actions (exploration) and leveraging known rewarding actions (exploitation). [Source](https://en.wikipedia.org/wiki/Exploration-exploitation_dilemma) - **Epsilon-Greedy:** A simple exploration strategy that takes a random action with probability epsilon and the greedy action otherwise. [Source](https://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform_strategies) - **Upper Confidence Bound (UCB):** An exploration strategy that selects actions based on their estimated value plus an exploration bonus. [Source](https://en.wikipedia.org/wiki/Upper_confidence_bound) - **Bellman Equation:** The recursive equation relating the value of a state to the values of successor states, foundational to dynamic programming solutions. [Source](https://en.wikipedia.org/wiki/Bellman_equation) #### 2.5.2 Model-Free Methods - **Q-Learning:** A model-free, off-policy algorithm that learns the optimal action-value function using temporal difference updates. [Source](https://en.wikipedia.org/wiki/Q-learning) - **SARSA:** An on-policy temporal difference method that updates Q-values based on the action actually taken. [Source](https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action) - **Deep Q-Network (DQN):** Combines Q-learning with deep neural networks, achieving human-level play on Atari games. [Source](https://en.wikipedia.org/wiki/Deep_reinforcement_learning) - **Double DQN:** Addresses overestimation bias in DQN by using separate networks for action selection and evaluation. [Source](https://arxiv.org/abs/1509.06461) - **Dueling DQN:** Separates the Q-network into state-value and advantage streams for better value estimation. [Source](https://arxiv.org/abs/1511.06581) - **Prioritized Experience Replay:** Replays transitions with higher temporal-difference error more frequently to improve sample efficiency. [Source](https://arxiv.org/abs/1511.05952) - **REINFORCE:** A Monte Carlo policy gradient method that updates the policy by following the gradient of expected return. [Source](https://en.wikipedia.org/wiki/REINFORCE) - **Actor-Critic Methods:** Combine a policy network (actor) with a value network (critic) to reduce variance of policy gradient estimates. [Source](https://en.wikipedia.org/wiki/Actor-critic_algorithm) - **Advantage Actor-Critic (A2C/A3C):** Uses the advantage function (Q − V) to reduce variance, with A3C adding asynchronous parallel training. [Source](https://arxiv.org/abs/1602.01783) - **Proximal Policy Optimization (PPO):** A stable policy gradient method that uses clipped surrogate objectives to limit the size of policy updates. [Source](https://arxiv.org/abs/1707.06347) - **Trust Region Policy Optimization (TRPO):** Constrains policy updates to a trust region to guarantee monotonic improvement. [Source](https://arxiv.org/abs/1502.05477) - **Soft Actor-Critic (SAC):** A maximum entropy RL algorithm that optimizes both expected return and policy entropy for robustness and exploration. [Source](https://arxiv.org/abs/1801.01290) - **TD3 (Twin Delayed DDPG):** Improves DDPG by using twin Q-networks, delayed policy updates, and target policy smoothing. [Source](https://arxiv.org/abs/1802.09477) - **DDPG (Deep Deterministic Policy Gradient):** An off-policy actor-critic algorithm for continuous action spaces. [Source](https://arxiv.org/abs/1509.02971) #### 2.5.3 Model-Based Methods - **Model-Based Reinforcement Learning:** Methods that learn a model of the environment's dynamics and use it for planning. [Source](https://en.wikipedia.org/wiki/Model-based_reinforcement_learning) - **Dyna-Q:** Integrates model-free learning with model-based planning by generating simulated experience from a learned model. [Source](https://en.wikipedia.org/wiki/Dyna-Q) - **Monte Carlo Tree Search (MCTS):** A search algorithm that uses random simulations to build a search tree, famously used in AlphaGo. [Source](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search) - **MuZero:** DeepMind's algorithm that learns a model of the environment without explicit state representation, mastering games without knowing rules. [Source](https://en.wikipedia.org/wiki/MuZero) - **World Models:** Learn a compressed representation of the environment that the agent can use for planning in imagination. [Source](https://arxiv.org/abs/1803.10122) - **Dreamer (DreamerV1/V2/V3):** A family of model-based RL agents that learn world models and plan by imagining trajectories in latent space. [Source](https://arxiv.org/abs/2301.04104) #### 2.5.4 Multi-Agent Reinforcement Learning (MARL) - **Multi-Agent RL:** Studies how multiple agents learn and interact in shared environments. [Source](https://en.wikipedia.org/wiki/Multi-agent_reinforcement_learning) - **Independent Q-Learning:** Each agent independently learns its own Q-function, treating other agents as part of the environment. [Source](https://en.wikipedia.org/wiki/Multi-agent_reinforcement_learning) - **QMIX:** A value decomposition method that factors a joint action-value function into per-agent utilities for cooperative MARL. [Source](https://arxiv.org/abs/1803.11485) - **MAPPO (Multi-Agent PPO):** Applies PPO in multi-agent settings with centralized training and decentralized execution. [Source](https://arxiv.org/abs/2103.01955) - **Nash Equilibrium in MARL:** The concept from game theory where no agent can improve by unilaterally changing strategy, applied to multi-agent learning. [Source](https://en.wikipedia.org/wiki/Nash_equilibrium) - **Self-Play:** A training paradigm where an agent improves by playing against copies of itself, famously used in AlphaZero. [Source](https://en.wikipedia.org/wiki/Self-play) #### 2.5.5 Inverse Reinforcement Learning - **Inverse Reinforcement Learning (IRL):** Infers the reward function from observed expert behavior, rather than being given it explicitly. [Source](https://en.wikipedia.org/wiki/Inverse_reinforcement_learning) - **Maximum Entropy IRL:** Assumes the expert acts to maximize entropy subject to matching feature expectations, yielding a unique reward. [Source](https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf) #### 2.5.6 Offline Reinforcement Learning - **Offline (Batch) RL:** Learns a policy entirely from a fixed dataset of previously collected interactions, without further environment interaction. [Source](https://arxiv.org/abs/2005.01643) - **Conservative Q-Learning (CQL):** Regularizes Q-values to be conservative (lower for unseen actions) to prevent overestimation in offline settings. [Source](https://arxiv.org/abs/2006.04779) - **Decision Transformer:** Frames RL as sequence modeling, using a transformer to predict actions conditioned on desired returns. [Source](https://arxiv.org/abs/2106.01345) ### 2.6 Transfer Learning - **Transfer Learning:** Leveraging knowledge learned in one task or domain to improve performance on a different but related task. [Source](https://en.wikipedia.org/wiki/Transfer_learning) - **Fine-Tuning:** Adapting a pre-trained model to a new task by continuing training on task-specific data, typically with a lower learning rate. [Source](https://en.wikipedia.org/wiki/Fine-tuning_\(deep_learning\)) - **Domain Adaptation:** Adapting a model trained on a source domain to perform well on a different but related target domain. [Source](https://en.wikipedia.org/wiki/Domain_adaptation) - **Zero-Shot Learning:** The ability to recognize or classify instances of categories never seen during training. [Source](https://en.wikipedia.org/wiki/Zero-shot_learning) - **Few-Shot Learning:** Learning to generalize from only a handful of labeled examples per class. [Source](https://en.wikipedia.org/wiki/Few-shot_learning) - **Meta-Learning (Learning to Learn):** Algorithms that learn the learning process itself, enabling rapid adaptation to new tasks. [Source](https://en.wikipedia.org/wiki/Meta-learning_\(computer_science\)) - **MAML (Model-Agnostic Meta-Learning):** A meta-learning algorithm that finds model initializations that can be fine-tuned quickly to new tasks. [Source](https://arxiv.org/abs/1703.03400) - **Prototypical Networks:** A few-shot classification method that classifies based on distance to class prototype embeddings. [Source](https://arxiv.org/abs/1703.05175) - **Knowledge Distillation:** Training a smaller "student" model to mimic the outputs of a larger "teacher" model. [Source](https://en.wikipedia.org/wiki/Knowledge_distillation) - **Multi-Task Learning:** Training a single model on multiple related tasks simultaneously to improve generalization. [Source](https://en.wikipedia.org/wiki/Multi-task_learning) - **Curriculum Learning:** Training a model by presenting examples in a meaningful order, typically from easy to hard. [Source](https://en.wikipedia.org/wiki/Curriculum_learning) ### 2.7 Ensemble Methods - **Ensemble Learning:** Combining multiple models to produce a stronger overall prediction than any individual model. [Source](https://en.wikipedia.org/wiki/Ensemble_learning) - **Bagging (Bootstrap Aggregating):** Trains multiple models on random bootstrap samples and averages their predictions to reduce variance. [Source](https://en.wikipedia.org/wiki/Bootstrap_aggregating) - **Boosting:** Sequentially trains weak learners, with each focusing on previously misclassified examples. [Source](https://en.wikipedia.org/wiki/Boosting_\(machine_learning\)) - **Stacking:** Trains a meta-learner to combine the predictions of multiple base models. [Source](https://en.wikipedia.org/wiki/Ensemble_learning#Stacking) - **Voting Classifier:** Combines predictions from multiple models by majority vote (hard) or averaged probabilities (soft). [Source](https://en.wikipedia.org/wiki/Ensemble_learning#Voting) - **Mixture of Experts (MoE):** Routes different inputs to different specialized sub-networks, allowing efficient scaling of model capacity. [Source](https://en.wikipedia.org/wiki/Mixture_of_experts) ### 2.8 Feature Engineering & Selection - **Feature Engineering:** The process of creating informative input features from raw data to improve model performance. [Source](https://en.wikipedia.org/wiki/Feature_engineering) - **Feature Selection:** Choosing a subset of relevant features to reduce dimensionality and improve model generalization. [Source](https://en.wikipedia.org/wiki/Feature_selection) - **One-Hot Encoding:** Converts categorical variables into binary vectors with a single 1 indicating the category. [Source](https://en.wikipedia.org/wiki/One-hot) - **Feature Scaling (Normalization/Standardization):** Rescaling features to a common range or distribution to improve convergence and performance. [Source](https://en.wikipedia.org/wiki/Feature_scaling) - **Mutual Information:** Measures the statistical dependence between a feature and the target, used for feature selection. [Source](https://en.wikipedia.org/wiki/Mutual_information) - **Recursive Feature Elimination (RFE):** Iteratively removes the least important features based on model weights or importance scores. [Source](https://scikit-learn.org/stable/modules/feature_selection.html) - **Embedding Features:** Learning dense, low-dimensional representations of categorical variables (e.g., word embeddings, entity embeddings). [Source](https://en.wikipedia.org/wiki/Word_embedding) ### 2.9 Model Selection & Validation - **Cross-Validation:** Evaluating model performance by partitioning data into training and validation sets multiple times. [Source](https://en.wikipedia.org/wiki/Cross-validation_\(statistics\)) - **k-Fold Cross-Validation:** Divides data into k folds, training on k-1 and validating on 1, rotating through all folds. [Source](https://en.wikipedia.org/wiki/Cross-validation_\(statistics\)#k-fold_cross-validation) - **Hyperparameter Tuning:** The process of selecting the best configuration of model hyperparameters (learning rate, regularization, etc.). [Source](https://en.wikipedia.org/wiki/Hyperparameter_optimization) - **Grid Search:** Exhaustively evaluates all combinations of specified hyperparameter values. [Source](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Grid_search) - **Random Search:** Samples hyperparameter combinations randomly, often more efficient than grid search in high dimensions. [Source](https://en.wikipedia.org/wiki/Hyperparameter_optimization#Random_search) - **Bayesian Optimization:** Uses a probabilistic surrogate model to intelligently select the next hyperparameters to evaluate. [Source](https://en.wikipedia.org/wiki/Bayesian_optimization) - **Neural Architecture Search (NAS):** Automates the design of neural network architectures using search algorithms. [Source](https://en.wikipedia.org/wiki/Neural_architecture_search) - **AutoML:** Automated machine learning systems that handle feature engineering, model selection, and hyperparameter tuning. [Source](https://en.wikipedia.org/wiki/Automated_machine_learning) - **Early Stopping:** Halting training when validation performance stops improving to prevent overfitting. [Source](https://en.wikipedia.org/wiki/Early_stopping) ### 2.10 Probabilistic & Bayesian Machine Learning - **Bayesian Machine Learning:** Applying Bayesian probability to machine learning, treating model parameters as distributions rather than point estimates. [Source](https://en.wikipedia.org/wiki/Bayesian_inference) - **Bayesian Neural Networks:** Neural networks with probability distributions over weights, providing uncertainty estimates. [Source](https://en.wikipedia.org/wiki/Bayesian_deep_learning) - **Variational Inference:** Approximates intractable posterior distributions by optimization, finding the closest distribution in a tractable family. [Source](https://en.wikipedia.org/wiki/Variational_Bayesian_methods) - **Markov Chain Monte Carlo (MCMC):** A class of algorithms for sampling from probability distributions, used for Bayesian posterior estimation. [Source](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) - **Bayesian Optimization (for ML):** A sequential design strategy for global optimization of black-box functions using Gaussian process surrogates. [Source](https://en.wikipedia.org/wiki/Bayesian_optimization) - **Gaussian Processes:** A non-parametric Bayesian approach defining distributions over functions, used for regression and classification with uncertainty. [Source](https://en.wikipedia.org/wiki/Gaussian_process) - **Hidden Markov Models (HMM):** Probabilistic models with hidden states and observed emissions, used in speech recognition and bioinformatics. [Source](https://en.wikipedia.org/wiki/Hidden_Markov_model) - **Conditional Random Fields (CRF):** Discriminative graphical models for structured prediction, widely used for sequence labeling. [Source](https://en.wikipedia.org/wiki/Conditional_random_field) - **Probabilistic Graphical Models:** Frameworks (Bayesian networks, Markov random fields) for representing complex probability distributions using graphs. [Source](https://en.wikipedia.org/wiki/Graphical_model) - **Bayesian Networks:** Directed acyclic graphs encoding conditional dependencies between random variables. [Source](https://en.wikipedia.org/wiki/Bayesian_network) - **Markov Random Fields:** Undirected graphical models representing joint probability distributions with local interaction potentials. [Source](https://en.wikipedia.org/wiki/Markov_random_field) - **Expectation-Maximization (EM) Algorithm:** An iterative method for finding maximum likelihood estimates in models with latent variables. [Source](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) --- ## 3. Deep Learning ### 3.1 Neural Network Fundamentals - **Artificial Neuron (Perceptron):** A computational unit that computes a weighted sum of inputs, adds a bias, and applies an activation function. [Source](https://en.wikipedia.org/wiki/Artificial_neuron) - **Multilayer Perceptron (MLP):** A feedforward neural network with one or more hidden layers capable of approximating arbitrary functions. [Source](https://en.wikipedia.org/wiki/Multilayer_perceptron) - **Universal Approximation Theorem:** States that a feedforward network with a single hidden layer can approximate any continuous function on compact sets. [Source](https://en.wikipedia.org/wiki/Universal_approximation_theorem) - **Activation Functions:** Nonlinear functions applied to neuron outputs; common choices include ReLU, sigmoid, tanh, and GELU. [Source](https://en.wikipedia.org/wiki/Activation_function) - **ReLU (Rectified Linear Unit):** f(x) = max(0, x), the most widely used activation that avoids vanishing gradients for positive values. [Source](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\)) - **Sigmoid:** Squashes input to (0, 1), historically used for binary outputs but prone to vanishing gradients. [Source](https://en.wikipedia.org/wiki/Sigmoid_function) - **Tanh:** Squashes input to (-1, 1), zero-centered but still susceptible to vanishing gradients. [Source](https://en.wikipedia.org/wiki/Hyperbolic_functions) - **GELU (Gaussian Error Linear Unit):** A smooth approximation of ReLU that weights inputs by their probability under a Gaussian, widely used in transformers. [Source](https://arxiv.org/abs/1606.08415) - **Swish/SiLU:** f(x) = x · sigmoid(x), a self-gated activation found by NAS to outperform ReLU in deep networks. [Source](https://arxiv.org/abs/1710.05941) - **Leaky ReLU:** Allows a small gradient for negative inputs to avoid dying ReLU problems. [Source](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\)#Leaky_ReLU) - **Softmax:** Converts a vector of logits into a probability distribution, used for multi-class classification outputs. [Source](https://en.wikipedia.org/wiki/Softmax_function) - **Loss Functions:** Functions that measure the discrepancy between model predictions and true values, driving the optimization process. [Source](https://en.wikipedia.org/wiki/Loss_function) - **Cross-Entropy Loss:** The standard loss for classification tasks, measuring the difference between predicted and true probability distributions. [Source](https://en.wikipedia.org/wiki/Cross-entropy) - **Mean Squared Error Loss:** The standard loss for regression tasks, computing the average squared difference between predictions and targets. [Source](https://en.wikipedia.org/wiki/Mean_squared_error) - **Hinge Loss:** Used in SVMs and some neural networks, penalizing predictions that are on the wrong side of the margin. [Source](https://en.wikipedia.org/wiki/Hinge_loss) - **Focal Loss:** Down-weights easy examples to focus training on hard, misclassified examples, useful for class-imbalanced data. [Source](https://arxiv.org/abs/1708.02002) - **Contrastive Loss:** Pulls together similar pairs and pushes apart dissimilar pairs in embedding space. [Source](https://en.wikipedia.org/wiki/Siamese_neural_network) - **Triplet Loss:** Ensures an anchor is closer to a positive example than a negative example by at least a margin. [Source](https://en.wikipedia.org/wiki/Triplet_loss) ### 3.2 Training Neural Networks - **Backpropagation:** The algorithm for efficiently computing gradients of the loss with respect to all network weights using the chain rule. [Source](https://en.wikipedia.org/wiki/Backpropagation) - **Stochastic Gradient Descent (SGD):** Updates parameters using gradients computed on random mini-batches rather than the full dataset. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) - **Adam Optimizer:** An adaptive learning rate optimizer that combines momentum and RMSProp, the most popular optimizer in deep learning. [Source](https://arxiv.org/abs/1412.6980) - **AdamW:** A variant of Adam with decoupled weight decay regularization for better generalization. [Source](https://arxiv.org/abs/1711.05101) - **RMSProp:** Adapts learning rates by dividing by a running average of squared gradients. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#RMSProp) - **Learning Rate Scheduling:** Strategies for adjusting the learning rate during training (cosine decay, step decay, warmup, etc.). [Source](https://en.wikipedia.org/wiki/Learning_rate) - **Batch Normalization:** Normalizes layer inputs within each mini-batch to stabilize and accelerate training. [Source](https://en.wikipedia.org/wiki/Batch_normalization) - **Layer Normalization:** Normalizes across the feature dimension for each individual sample, preferred in transformers. [Source](https://arxiv.org/abs/1607.06450) - **Group Normalization:** Divides channels into groups and normalizes within each group, robust to small batch sizes. [Source](https://arxiv.org/abs/1803.08494) - **Dropout:** Randomly sets neuron activations to zero during training to prevent co-adaptation and reduce overfitting. [Source](https://en.wikipedia.org/wiki/Dropout_\(neural_networks\)) - **Weight Decay (L2 Regularization):** Penalizes large weights by adding a fraction of the weight magnitudes to the loss. [Source](https://en.wikipedia.org/wiki/Regularization_\(mathematics\)) - **Gradient Clipping:** Caps gradient magnitudes to prevent exploding gradients during training. [Source](https://en.wikipedia.org/wiki/Gradient_clipping) - **Vanishing Gradient Problem:** The difficulty of training deep networks when gradients become exponentially small in early layers. [Source](https://en.wikipedia.org/wiki/Vanishing_gradient_problem) - **Exploding Gradient Problem:** When gradients grow exponentially during backpropagation, causing unstable training. [Source](https://en.wikipedia.org/wiki/Vanishing_gradient_problem) - **Residual Connections (Skip Connections):** Shortcut connections that allow gradients to flow directly through layers, enabling training of very deep networks. [Source](https://en.wikipedia.org/wiki/Residual_neural_network) - **Data Augmentation:** Applying transformations (rotations, flips, crops, color jitter) to training data to increase diversity and reduce overfitting. [Source](https://en.wikipedia.org/wiki/Data_augmentation) - **MixUp:** A regularization technique that trains on convex combinations of pairs of training examples and their labels. [Source](https://arxiv.org/abs/1710.09412) - **CutMix:** Replaces a patch of one image with a patch from another and mixes their labels proportionally. [Source](https://arxiv.org/abs/1905.04899) - **Label Smoothing:** Replaces hard one-hot labels with soft labels to prevent the model from becoming overconfident. [Source](https://arxiv.org/abs/1906.02629) - **Mixed Precision Training:** Uses lower-precision (FP16/BF16) arithmetic to speed up training and reduce memory with minimal accuracy loss. [Source](https://arxiv.org/abs/1710.03740) - **Gradient Accumulation:** Simulates larger batch sizes by accumulating gradients over multiple mini-batches before updating. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) - **Weight Initialization (Xavier, He, etc.):** Strategies for setting initial weights to ensure signals and gradients flow properly through the network. [Source](https://en.wikipedia.org/wiki/Xavier_initialization) ### 3.3 Convolutional Neural Networks (CNNs) - **Convolutional Neural Network:** A neural network using convolutional layers that apply learnable filters to detect local patterns in grid-like data. [Source](https://en.wikipedia.org/wiki/Convolutional_neural_network) - **Convolutional Layer:** Applies a set of learnable filters across the input, producing feature maps that capture local patterns. [Source](https://en.wikipedia.org/wiki/Convolutional_neural_network) - **Pooling Layer:** Reduces spatial dimensions by downsampling (max pooling, average pooling), providing translation invariance. [Source](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layers) - **Stride and Padding:** Stride controls the step size of filter movement; padding adds borders to preserve spatial dimensions. [Source](https://en.wikipedia.org/wiki/Convolutional_neural_network) - **Dilated (Atrous) Convolutions:** Convolutions with gaps in the kernel to increase the receptive field without increasing parameters. [Source](https://en.wikipedia.org/wiki/Dilated_convolution) - **Depthwise Separable Convolutions:** Factorize standard convolutions into depthwise and pointwise steps, drastically reducing computation. [Source](https://en.wikipedia.org/wiki/Depthwise_separable_convolution) - **Deformable Convolutions:** Learn offsets for sampling locations, allowing the network to adapt its receptive field to object shapes. [Source](https://arxiv.org/abs/1703.06211) - **LeNet-5 (1998):** Yann LeCun's pioneering CNN for handwritten digit recognition, establishing the CNN architecture pattern. [Source](https://en.wikipedia.org/wiki/LeNet) - **AlexNet (2012):** The deep CNN that won ImageNet 2012, launching the deep learning revolution with GPU training and ReLU. [Source](https://en.wikipedia.org/wiki/AlexNet) - **VGGNet (2014):** Demonstrated that deeper networks with small 3×3 filters significantly improve performance. [Source](https://en.wikipedia.org/wiki/VGGNet) - **GoogLeNet/Inception (2014):** Introduced inception modules with parallel convolutions at multiple scales for efficient feature extraction. [Source](https://en.wikipedia.org/wiki/Inception_\(deep_learning_architecture\)) - **ResNet (2015):** Introduced residual connections enabling training of networks over 100 layers deep, winning ImageNet 2015. [Source](https://en.wikipedia.org/wiki/Residual_neural_network) - **DenseNet (2017):** Connects each layer to every other layer in a feed-forward fashion, encouraging feature reuse. [Source](https://arxiv.org/abs/1608.06993) - **EfficientNet (2019):** Uses compound scaling to uniformly scale depth, width, and resolution for optimal efficiency. [Source](https://arxiv.org/abs/1905.11946) - **MobileNet:** A family of lightweight CNNs using depthwise separable convolutions for efficient inference on mobile devices. [Source](https://arxiv.org/abs/1704.04861) - **ConvNeXt (2022):** Modernizes pure CNN architectures with transformer-inspired design choices, achieving competitive performance. [Source](https://arxiv.org/abs/2201.03545) ### 3.4 Recurrent Neural Networks (RNNs) - **Recurrent Neural Network:** A neural network with loops that allow information to persist across time steps, processing sequential data. [Source](https://en.wikipedia.org/wiki/Recurrent_neural_network) - **Vanilla RNN:** The simplest recurrent architecture, which suffers from vanishing/exploding gradients for long sequences. [Source](https://en.wikipedia.org/wiki/Recurrent_neural_network) - **Long Short-Term Memory (LSTM):** A gated RNN architecture with cell state, forget gate, input gate, and output gate to handle long-range dependencies. [Source](https://en.wikipedia.org/wiki/Long_short-term_memory) - **Gated Recurrent Unit (GRU):** A simplified gated RNN with reset and update gates, often performing comparably to LSTM with fewer parameters. [Source](https://en.wikipedia.org/wiki/Gated_recurrent_unit) - **Bidirectional RNN:** Processes sequences in both forward and backward directions, capturing context from both sides. [Source](https://en.wikipedia.org/wiki/Bidirectional_recurrent_neural_networks) - **Sequence-to-Sequence (Seq2Seq):** An encoder-decoder architecture that maps input sequences to output sequences of potentially different lengths. [Source](https://en.wikipedia.org/wiki/Seq2seq) - **Attention Mechanism (for RNNs):** Allows the decoder to focus on different parts of the input sequence at each output step, dramatically improving performance. [Source](https://arxiv.org/abs/1409.0473) - **Teacher Forcing:** Training recurrent models by feeding ground-truth outputs from the previous time step rather than model predictions. [Source](https://en.wikipedia.org/wiki/Teacher_forcing) ### 3.5 Transformer Architecture - **Transformer:** An architecture based entirely on self-attention mechanisms, replacing recurrence and convolutions for sequence modeling. [Source](https://arxiv.org/abs/1706.03762) - **Self-Attention (Scaled Dot-Product):** Computes attention weights between all pairs of positions in a sequence, enabling global context capture. [Source](https://arxiv.org/abs/1706.03762) - **Multi-Head Attention:** Runs multiple attention functions in parallel, allowing the model to attend to different types of relationships simultaneously. [Source](https://arxiv.org/abs/1706.03762) - **Positional Encoding:** Injects information about token position into the model since transformers have no inherent notion of order. [Source](https://arxiv.org/abs/1706.03762) - **Rotary Position Embedding (RoPE):** Encodes position by rotating the query and key vectors, enabling better length generalization. [Source](https://arxiv.org/abs/2104.09864) - **ALiBi (Attention with Linear Biases):** Adds linear position-dependent biases to attention scores, allowing extrapolation to longer sequences. [Source](https://arxiv.org/abs/2108.12409) - **Feed-Forward Network (in Transformer):** A position-wise two-layer MLP applied independently to each position after the attention sub-layer. [Source](https://arxiv.org/abs/1706.03762) - **Encoder-Decoder Transformer:** The original transformer architecture with an encoder for input and decoder for output, used in machine translation. [Source](https://arxiv.org/abs/1706.03762) - **Decoder-Only Transformer:** Uses only the transformer decoder with causal masking, the architecture behind GPT-series models. [Source](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer) - **Encoder-Only Transformer:** Uses only the encoder with bidirectional attention, the architecture behind BERT-style models. [Source](https://arxiv.org/abs/1810.04805) - **Key-Value (KV) Cache:** Stores previously computed key and value tensors during autoregressive generation to avoid redundant computation. [Source](https://en.wikipedia.org/wiki/Transformer_\(deep_learning_architecture\)) - **Flash Attention:** An IO-aware exact attention algorithm that reduces memory access and speeds up attention computation significantly. [Source](https://arxiv.org/abs/2205.14135) - **Multi-Query Attention (MQA):** Shares key and value heads across attention heads to reduce memory bandwidth during inference. [Source](https://arxiv.org/abs/1911.02150) - **Grouped-Query Attention (GQA):** A compromise between multi-head and multi-query attention, grouping heads to share key-value pairs. [Source](https://arxiv.org/abs/2305.13245) - **Sparse Attention:** Restricts attention to a subset of positions (local, strided, etc.) to reduce quadratic complexity. [Source](https://arxiv.org/abs/1904.10509) - **Linear Attention:** Approximates softmax attention with linear complexity using kernel functions or feature maps. [Source](https://arxiv.org/abs/2006.16236) - **Mixture of Experts in Transformers:** Replaces the FFN layer with multiple expert networks and a routing mechanism, enabling massive scale with sparse computation. [Source](https://arxiv.org/abs/2101.03961) - **Mamba / State Space Models:** Architectures based on structured state space models as an alternative to transformers for sequence modeling with linear complexity. [Source](https://arxiv.org/abs/2312.00752) - **RWKV:** A linear-complexity architecture combining advantages of RNNs and transformers for efficient sequence processing. [Source](https://arxiv.org/abs/2305.13048) ### 3.6 Generative Models - **Generative Model:** A model that learns the joint probability distribution of data and can generate new samples. [Source](https://en.wikipedia.org/wiki/Generative_model) #### 3.6.1 Generative Adversarial Networks (GANs) - **GAN (Generative Adversarial Network):** A framework where a generator and discriminator are trained adversarially, with the generator learning to produce realistic samples. [Source](https://en.wikipedia.org/wiki/Generative_adversarial_network) - **DCGAN:** Applied deep convolutional architectures to GANs, establishing architectural best practices for stable GAN training. [Source](https://arxiv.org/abs/1511.06434) - **Wasserstein GAN (WGAN):** Uses the Wasserstein distance as the training objective, providing more stable gradients and training. [Source](https://arxiv.org/abs/1701.07875) - **StyleGAN:** NVIDIA's architecture that controls image synthesis at each resolution level via learned style vectors, producing high-fidelity faces. [Source](https://en.wikipedia.org/wiki/StyleGAN) - **Conditional GAN (cGAN):** Conditions both generator and discriminator on additional information (class labels, text) to control generation. [Source](https://arxiv.org/abs/1411.1784) - **CycleGAN:** Learns unpaired image-to-image translation using cycle consistency loss (e.g., horse-to-zebra). [Source](https://arxiv.org/abs/1703.10593) - **Pix2Pix:** A conditional GAN for paired image-to-image translation tasks. [Source](https://arxiv.org/abs/1611.07004) - **Progressive GAN:** Grows both generator and discriminator progressively from low to high resolution during training. [Source](https://arxiv.org/abs/1710.10196) - **Mode Collapse:** A common GAN training failure where the generator produces limited variety of outputs. [Source](https://en.wikipedia.org/wiki/Generative_adversarial_network#Mode_collapse) #### 3.6.2 Variational Autoencoders (VAEs) - **Variational Autoencoder (VAE):** A generative model that learns a latent representation by optimizing a variational lower bound on the data likelihood. [Source](https://en.wikipedia.org/wiki/Variational_autoencoder) - **Evidence Lower Bound (ELBO):** The objective function for VAEs, combining reconstruction quality and KL divergence of the latent distribution from a prior. [Source](https://en.wikipedia.org/wiki/Variational_autoencoder) - **β-VAE:** A VAE variant that upweights the KL divergence term to encourage disentangled latent representations. [Source](https://arxiv.org/abs/1804.03599) - **VQ-VAE (Vector Quantized VAE):** Uses discrete latent codes via vector quantization, enabling high-quality generation when paired with autoregressive priors. [Source](https://arxiv.org/abs/1711.00937) #### 3.6.3 Diffusion Models - **Diffusion Models (Denoising Diffusion Probabilistic Models):** Generative models that learn to reverse a gradual noising process, achieving state-of-the-art image generation. [Source](https://en.wikipedia.org/wiki/Diffusion_model) - **DDPM (Denoising Diffusion Probabilistic Model):** The foundational diffusion model that demonstrated competitive image generation via iterative denoising. [Source](https://arxiv.org/abs/2006.11239) - **Score Matching:** Trains a model to estimate the gradient of the log data density (score function), used in score-based generative models. [Source](https://arxiv.org/abs/2011.13456) - **Latent Diffusion Models (LDM):** Performs the diffusion process in a compressed latent space for efficiency, the basis of Stable Diffusion. [Source](https://arxiv.org/abs/2112.10752) - **Classifier-Free Guidance:** Improves conditional generation quality by interpolating between conditional and unconditional model predictions. [Source](https://arxiv.org/abs/2207.12598) - **DDIM (Denoising Diffusion Implicit Models):** Accelerates diffusion sampling by using non-Markovian deterministic steps. [Source](https://arxiv.org/abs/2010.02502) - **Consistency Models:** Directly map noisy inputs to clean outputs in a single step, enabling fast generation. [Source](https://arxiv.org/abs/2303.01469) - **Flow Matching:** An alternative to diffusion that learns vector fields to transform noise to data, offering simpler training and faster sampling. [Source](https://arxiv.org/abs/2210.02747) - **Rectified Flow:** Straightens the flow trajectories between noise and data distributions for more efficient generation. [Source](https://arxiv.org/abs/2209.03003) #### 3.6.4 Normalizing Flows - **Normalizing Flows:** Generative models that transform a simple distribution to a complex one through a sequence of invertible transformations with tractable Jacobians. [Source](https://en.wikipedia.org/wiki/Normalizing_flow) - **RealNVP:** A normalizing flow using affine coupling layers for efficient and exact density estimation and sampling. [Source](https://arxiv.org/abs/1605.08803) - **Glow:** Extends RealNVP with invertible 1×1 convolutions for improved generative modeling of images. [Source](https://arxiv.org/abs/1807.03039) #### 3.6.5 Autoregressive Models - **Autoregressive Models:** Generate data one element at a time, conditioning each element on all previously generated ones. [Source](https://en.wikipedia.org/wiki/Autoregressive_model) - **PixelCNN/PixelRNN:** Autoregressive models that generate images pixel by pixel using masked convolutions or recurrence. [Source](https://arxiv.org/abs/1601.06759) - **WaveNet:** An autoregressive model for raw audio generation using dilated causal convolutions. [Source](https://en.wikipedia.org/wiki/WaveNet) #### 3.6.6 Energy-Based Models - **Energy-Based Models (EBMs):** Learn an energy function that assigns low energy to data-like configurations, with generation via MCMC sampling. [Source](https://en.wikipedia.org/wiki/Energy-based_model) - **Boltzmann Machine:** A stochastic generative model with symmetric connections between visible and hidden units. [Source](https://en.wikipedia.org/wiki/Boltzmann_machine) - **Restricted Boltzmann Machine (RBM):** A Boltzmann machine with no intra-layer connections, enabling efficient training via contrastive divergence. [Source](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine) ### 3.7 Graph Neural Networks - **Graph Neural Network (GNN):** Neural networks that operate on graph-structured data, learning representations of nodes, edges, and graphs. [Source](https://en.wikipedia.org/wiki/Graph_neural_network) - **Message Passing Neural Network (MPNN):** A general framework where nodes aggregate messages from neighbors to update their representations. [Source](https://arxiv.org/abs/1704.01212) - **Graph Convolutional Network (GCN):** Extends convolutions to graphs by aggregating features from neighboring nodes via the graph Laplacian. [Source](https://arxiv.org/abs/1609.02907) - **GraphSAGE:** Learns node representations by sampling and aggregating features from a fixed-size neighborhood. [Source](https://arxiv.org/abs/1706.02216) - **Graph Attention Network (GAT):** Uses attention mechanisms to learn the importance of each neighbor's features during aggregation. [Source](https://arxiv.org/abs/1710.10903) - **Graph Isomorphism Network (GIN):** A maximally powerful GNN for distinguishing graph structures, based on the Weisfeiler-Leman test. [Source](https://arxiv.org/abs/1810.00826) - **Graph Transformers:** Adapt transformer architectures to graph data, using attention over graph nodes with structural encodings. [Source](https://arxiv.org/abs/2012.09699) - **Over-Smoothing Problem:** The phenomenon in deep GNNs where node representations converge and become indistinguishable. [Source](https://arxiv.org/abs/1801.07606) ### 3.8 Neural Network Interpretability & Analysis - **Neural Network Interpretability:** The study of understanding what neural networks learn and why they make specific predictions. [Source](https://en.wikipedia.org/wiki/Explainable_artificial_intelligence) - **Saliency Maps:** Highlight which input features most influence a model's output, often computed via gradients. [Source](https://en.wikipedia.org/wiki/Saliency_map) - **Grad-CAM:** Produces visual explanations of CNN decisions by weighting feature maps with gradients of the target class. [Source](https://arxiv.org/abs/1610.02391) - **SHAP (SHapley Additive exPlanations):** Uses game-theoretic Shapley values to assign feature importance to model predictions. [Source](https://en.wikipedia.org/wiki/SHAP) - **LIME (Local Interpretable Model-agnostic Explanations):** Explains individual predictions by fitting a simple interpretable model locally around the input. [Source](https://en.wikipedia.org/wiki/Local_interpretable_model-agnostic_explanations) - **Attention Visualization:** Visualizing attention weights in transformers to understand which tokens the model focuses on. [Source](https://arxiv.org/abs/1706.03762) - **Probing Classifiers:** Train simple classifiers on intermediate representations to test what linguistic or conceptual information is encoded. [Source](https://arxiv.org/abs/1909.03368) - **Mechanistic Interpretability:** The research program of reverse-engineering neural network computations into human-understandable algorithms and circuits. [Source](https://en.wikipedia.org/wiki/Mechanistic_interpretability) - **Superposition:** The phenomenon where neural networks represent more features than they have dimensions by encoding features in overlapping directions. [Source](https://transformer-circuits.pub/2022/toy_model/index.html) - **Feature Visualization:** Generating synthetic inputs that maximally activate specific neurons or features to understand what the network detects. [Source](https://distill.pub/2017/feature-visualization/) - **Circuits Analysis:** Identifying small subnetworks (circuits) within neural networks responsible for specific behaviors. [Source](https://distill.pub/2020/circuits/) - **Sparse Autoencoders (for Interpretability):** Used to decompose model activations into interpretable features by enforcing sparsity. [Source](https://transformer-circuits.pub/2023/monosemantic-features/index.html) - **Polysemanticity:** When a single neuron responds to multiple unrelated concepts, complicating interpretation. [Source](https://transformer-circuits.pub/2022/toy_model/index.html) - **Monosemanticity:** When a neuron or feature corresponds to a single interpretable concept, the ideal for interpretability. [Source](https://transformer-circuits.pub/2023/monosemantic-features/index.html) --- ## 4. Natural Language Processing (NLP) ### 4.1 Text Preprocessing & Representation - **Tokenization:** Splitting text into discrete units (words, subwords, characters) for processing by NLP models. [Source](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) - **Byte Pair Encoding (BPE):** A subword tokenization algorithm that iteratively merges the most frequent character pairs. [Source](https://en.wikipedia.org/wiki/Byte_pair_encoding) - **WordPiece:** A subword tokenization method used in BERT that greedily selects the longest matching subword. [Source](https://en.wikipedia.org/wiki/WordPiece) - **SentencePiece:** A language-independent tokenizer that treats the input as a raw byte stream and learns a subword vocabulary. [Source](https://github.com/google/sentencepiece) - **Bag of Words (BoW):** Represents text as a vector of word counts, ignoring order and grammar. [Source](https://en.wikipedia.org/wiki/Bag-of-words_model) - **TF-IDF (Term Frequency–Inverse Document Frequency):** Weights word importance by frequency in a document relative to the corpus. [Source](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) - **Word Embeddings:** Dense vector representations of words where semantically similar words are nearby in the vector space. [Source](https://en.wikipedia.org/wiki/Word_embedding) - **Word2Vec:** Learns word embeddings using shallow neural networks trained on predicting context words (Skip-gram) or target words (CBOW). [Source](https://en.wikipedia.org/wiki/Word2vec) - **GloVe:** Learns word vectors by factorizing the log of the word co-occurrence matrix. [Source](https://en.wikipedia.org/wiki/GloVe) - **FastText:** Extends Word2Vec by representing words as bags of character n-grams, handling rare and out-of-vocabulary words. [Source](https://en.wikipedia.org/wiki/FastText) - **Contextual Embeddings:** Token representations that vary based on the surrounding context, produced by models like BERT and ELMo. [Source](https://en.wikipedia.org/wiki/ELMo) - **Stemming:** Reducing words to their root form by removing suffixes using heuristic rules (e.g., Porter Stemmer). [Source](https://en.wikipedia.org/wiki/Stemming) - **Lemmatization:** Reducing words to their dictionary form (lemma) using morphological analysis. [Source](https://en.wikipedia.org/wiki/Lemmatisation) - **Stop Word Removal:** Filtering out common function words (the, is, at) that carry little semantic meaning. [Source](https://en.wikipedia.org/wiki/Stop_word) - **Named Entity Recognition (NER):** Identifying and classifying named entities (persons, organizations, locations) in text. [Source](https://en.wikipedia.org/wiki/Named-entity_recognition) - **Part-of-Speech Tagging:** Assigning grammatical categories (noun, verb, adjective) to each word in a sentence. [Source](https://en.wikipedia.org/wiki/Part-of-speech_tagging) - **Dependency Parsing:** Analyzing the grammatical structure of a sentence by identifying relationships between words. [Source](https://en.wikipedia.org/wiki/Dependency_grammar) - **Constituency Parsing:** Parsing a sentence into its hierarchical phrase structure (noun phrases, verb phrases, etc.). [Source](https://en.wikipedia.org/wiki/Parse_tree) - **Coreference Resolution:** Determining which expressions in a text refer to the same entity. [Source](https://en.wikipedia.org/wiki/Coreference) ### 4.2 Language Models - **Language Model:** A probabilistic model that assigns probabilities to sequences of words, fundamental to modern NLP. [Source](https://en.wikipedia.org/wiki/Language_model) - **n-gram Language Model:** Estimates word probabilities based on the preceding n-1 words using count-based statistics. [Source](https://en.wikipedia.org/wiki/N-gram) - **Neural Language Model:** Uses neural networks (RNNs, transformers) to model the probability distribution over word sequences. [Source](https://en.wikipedia.org/wiki/Language_model#Neural_language_models) - **Perplexity:** A metric for evaluating language models, measuring how well the model predicts a held-out test set (lower is better). [Source](https://en.wikipedia.org/wiki/Perplexity) ### 4.3 Pre-trained Language Models - **BERT (Bidirectional Encoder Representations from Transformers):** A pre-trained encoder model using masked language modeling and next sentence prediction for bidirectional contextualization. [Source](https://en.wikipedia.org/wiki/BERT_\(language_model\)) - **RoBERTa:** An optimized BERT training recipe with more data, longer training, and no next sentence prediction. [Source](https://arxiv.org/abs/1907.11692) - **ALBERT:** A lightweight BERT variant using parameter sharing and factorized embeddings to reduce model size. [Source](https://arxiv.org/abs/1909.11942) - **DeBERTa:** Improves BERT with disentangled attention and an enhanced mask decoder for better pretraining. [Source](https://arxiv.org/abs/2006.03654) - **ELECTRA:** A pre-training method that uses replaced token detection instead of masked language modeling for better sample efficiency. [Source](https://arxiv.org/abs/2003.10555) - **XLNet:** Combines autoregressive and autoencoding by permuting the factorization order of the input. [Source](https://arxiv.org/abs/1906.08237) - **T5 (Text-to-Text Transfer Transformer):** Frames all NLP tasks as text-to-text problems, using an encoder-decoder transformer. [Source](https://arxiv.org/abs/1910.10683) - **GPT Series (GPT-1/2/3/4):** A family of decoder-only transformers pre-trained with autoregressive language modeling, scaling to hundreds of billions of parameters. [Source](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer) - **Claude:** Anthropic's family of large language models trained with a focus on safety, helpfulness, and honesty using constitutional AI. [Source](https://en.wikipedia.org/wiki/Claude_\(language_model\)) - **LLaMA:** Meta's family of open-weight large language models designed to be efficient and accessible for research. [Source](https://en.wikipedia.org/wiki/LLaMA) - **Gemini:** Google DeepMind's multimodal model family trained on diverse data including text, code, images, audio, and video. [Source](https://en.wikipedia.org/wiki/Gemini_\(language_model\)) - **Mistral:** A family of efficient open-weight language models using techniques like sliding window attention and mixture of experts. [Source](https://en.wikipedia.org/wiki/Mistral_AI) - **PaLM:** Google's Pathways Language Model, a 540B-parameter model demonstrating breakthrough reasoning capabilities. [Source](https://arxiv.org/abs/2204.02311) - **ELMo:** Produced contextualized word embeddings from a bidirectional LSTM, one of the first contextualized representation methods. [Source](https://en.wikipedia.org/wiki/ELMo) ### 4.4 NLP Tasks & Applications - **Machine Translation:** Automatically converting text from one natural language to another. [Source](https://en.wikipedia.org/wiki/Machine_translation) - **Text Summarization:** Producing a concise version of a longer text while preserving key information. [Source](https://en.wikipedia.org/wiki/Automatic_summarization) - **Extractive Summarization:** Selects and concatenates the most important sentences from the original text. [Source](https://en.wikipedia.org/wiki/Automatic_summarization#Extraction-based_summarization) - **Abstractive Summarization:** Generates new sentences that capture the essence of the source text. [Source](https://en.wikipedia.org/wiki/Automatic_summarization#Abstraction-based_summarization) - **Sentiment Analysis:** Determining the emotional tone or opinion expressed in text (positive, negative, neutral). [Source](https://en.wikipedia.org/wiki/Sentiment_analysis) - **Text Classification:** Assigning predefined categories or labels to text documents. [Source](https://en.wikipedia.org/wiki/Document_classification) - **Question Answering:** Systems that automatically answer questions posed in natural language. [Source](https://en.wikipedia.org/wiki/Question_answering) - **Extractive QA:** Identifies a span of text in a given passage that answers the question. [Source](https://en.wikipedia.org/wiki/Question_answering) - **Generative QA:** Generates an answer in natural language, potentially synthesizing information from multiple sources. [Source](https://en.wikipedia.org/wiki/Question_answering) - **Open-Domain QA:** Answers questions without a pre-specified context, often using retrieval from large corpora. [Source](https://en.wikipedia.org/wiki/Question_answering) - **Natural Language Inference (NLI):** Determining whether a hypothesis is entailed, contradicted, or neutral with respect to a premise. [Source](https://en.wikipedia.org/wiki/Natural_language_inference) - **Text Generation:** Producing coherent and contextually relevant text given a prompt or context. [Source](https://en.wikipedia.org/wiki/Natural-language_generation) - **Dialogue Systems / Chatbots:** Systems that converse with humans in natural language for task completion or open-ended conversation. [Source](https://en.wikipedia.org/wiki/Dialogue_system) - **Information Extraction:** Automatically extracting structured information (entities, relations, events) from unstructured text. [Source](https://en.wikipedia.org/wiki/Information_extraction) - **Relation Extraction:** Identifying semantic relationships between entities mentioned in text. [Source](https://en.wikipedia.org/wiki/Relationship_extraction) - **Topic Modeling:** Discovering abstract topics in a collection of documents using statistical models like LDA. [Source](https://en.wikipedia.org/wiki/Topic_model) - **Latent Dirichlet Allocation (LDA):** A generative probabilistic model that discovers latent topic distributions in document collections. [Source](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) - **Semantic Similarity:** Measuring how similar in meaning two text passages are, often using embedding cosine similarity. [Source](https://en.wikipedia.org/wiki/Semantic_similarity) - **Textual Entailment:** Determining whether the truth of one text fragment follows from another. [Source](https://en.wikipedia.org/wiki/Textual_entailment) - **Paraphrase Detection:** Identifying whether two sentences express the same meaning in different words. [Source](https://en.wikipedia.org/wiki/Paraphrase) - **Word Sense Disambiguation:** Determining which sense of an ambiguous word is used in a given context. [Source](https://en.wikipedia.org/wiki/Word-sense_disambiguation) - **Semantic Role Labeling:** Identifying the semantic roles (agent, patient, instrument) played by words in a sentence. [Source](https://en.wikipedia.org/wiki/Semantic_role_labeling) - **Optical Character Recognition (OCR):** Converting images of text into machine-readable text. [Source](https://en.wikipedia.org/wiki/Optical_character_recognition) ### 4.5 Text Generation & Decoding - **Autoregressive Decoding:** Generating text one token at a time, each conditioned on all previously generated tokens. [Source](https://en.wikipedia.org/wiki/Autoregressive_model) - **Greedy Decoding:** Selects the highest-probability token at each step, simple but can produce suboptimal sequences. [Source](https://en.wikipedia.org/wiki/Greedy_algorithm) - **Beam Search:** Maintains multiple candidate sequences (beams) at each step, selecting the top-k most probable partial sequences. [Source](https://en.wikipedia.org/wiki/Beam_search) - **Top-k Sampling:** Randomly samples from the top k most probable tokens at each step, adding diversity to generation. [Source](https://arxiv.org/abs/1904.09751) - **Top-p (Nucleus) Sampling:** Samples from the smallest set of tokens whose cumulative probability exceeds p, adapting the candidate pool dynamically. [Source](https://arxiv.org/abs/1904.09751) - **Temperature Scaling:** Divides logits by a temperature parameter to control the sharpness of the probability distribution. [Source](https://en.wikipedia.org/wiki/Softmax_function#Softmax_with_temperature) - **Repetition Penalty:** Reduces the probability of tokens that have already appeared to prevent repetitive text generation. [Source](https://arxiv.org/abs/1909.05858) - **Speculative Decoding:** Uses a smaller draft model to propose tokens that a larger model then verifies, speeding up generation. [Source](https://arxiv.org/abs/2211.17192) ### 4.6 Information Retrieval - **Information Retrieval:** The science of searching for information in documents, databases, or the web. [Source](https://en.wikipedia.org/wiki/Information_retrieval) - **BM25:** A probabilistic ranking function based on term frequency and inverse document frequency, widely used in search. [Source](https://en.wikipedia.org/wiki/Okapi_BM25) - **Dense Retrieval:** Uses learned dense vector representations to match queries and documents via approximate nearest neighbor search. [Source](https://arxiv.org/abs/2004.04906) - **Retrieval-Augmented Generation (RAG):** Enhances language model generation by first retrieving relevant documents and conditioning on them. [Source](https://arxiv.org/abs/2005.11401) - **Vector Databases:** Databases optimized for storing and retrieving high-dimensional vectors using approximate nearest neighbor algorithms. [Source](https://en.wikipedia.org/wiki/Vector_database) - **Inverted Index:** A data structure mapping terms to the documents containing them, the backbone of traditional search engines. [Source](https://en.wikipedia.org/wiki/Inverted_index) - **Cross-Encoder Reranking:** Uses a cross-attention model to jointly encode query-document pairs for more accurate relevance scoring. [Source](https://arxiv.org/abs/1901.04085) - **ColBERT:** A late-interaction retrieval model that computes token-level embeddings and uses MaxSim for efficient yet expressive matching. [Source](https://arxiv.org/abs/2004.12832) --- ## 5. Computer Vision ### 5.1 Image Classification - **Image Classification:** Assigning a label from a fixed set of categories to an input image. [Source](https://en.wikipedia.org/wiki/Computer_vision#Recognition) - **ImageNet:** A large-scale image database with over 14 million images organized according to WordNet hierarchy, the benchmark that drove deep learning progress. [Source](https://en.wikipedia.org/wiki/ImageNet) - **Vision Transformer (ViT):** Applies a pure transformer to sequences of image patches, achieving strong image classification results. [Source](https://arxiv.org/abs/2010.11929) - **Transfer Learning in Vision:** Pre-training on ImageNet and fine-tuning on smaller datasets, the dominant paradigm for practical vision applications. [Source](https://en.wikipedia.org/wiki/Transfer_learning) - **Data Augmentation for Vision:** Techniques like random cropping, flipping, color jittering, and RandAugment to increase training data diversity. [Source](https://en.wikipedia.org/wiki/Data_augmentation) ### 5.2 Object Detection - **Object Detection:** Localizing and classifying multiple objects within an image with bounding boxes. [Source](https://en.wikipedia.org/wiki/Object_detection) - **R-CNN (Region-based CNN):** Proposes regions, extracts CNN features, and classifies each region. [Source](https://en.wikipedia.org/wiki/Region-based_convolutional_neural_network) - **Fast R-CNN:** Processes the entire image with a CNN once and extracts features for each region proposal. [Source](https://arxiv.org/abs/1504.08083) - **Faster R-CNN:** Introduces a Region Proposal Network (RPN) to generate proposals within the network, enabling end-to-end training. [Source](https://arxiv.org/abs/1506.01497) - **YOLO (You Only Look Once):** A single-pass detector that predicts bounding boxes and class probabilities directly from the full image in real time. [Source](https://en.wikipedia.org/wiki/You_Only_Look_Once) - **SSD (Single Shot MultiBox Detector):** Detects objects at multiple scales from different layers of the feature map in a single forward pass. [Source](https://arxiv.org/abs/1512.02325) - **DETR (Detection Transformer):** Uses a transformer encoder-decoder with bipartite matching loss, eliminating hand-designed components like NMS and anchor boxes. [Source](https://arxiv.org/abs/2005.12872) - **Feature Pyramid Network (FPN):** Builds a multi-scale feature pyramid from a single-scale input for detecting objects at different sizes. [Source](https://arxiv.org/abs/1612.03144) - **Anchor Boxes:** Predefined bounding box shapes at various scales and aspect ratios used as reference templates in detection models. [Source](https://en.wikipedia.org/wiki/Object_detection) - **Non-Maximum Suppression (NMS):** A post-processing step that removes duplicate overlapping detections by keeping only the highest-confidence box. [Source](https://en.wikipedia.org/wiki/Object_detection) - **Intersection over Union (IoU):** The ratio of overlap to union between predicted and ground truth bounding boxes, the standard detection metric. [Source](https://en.wikipedia.org/wiki/Jaccard_index) ### 5.3 Image Segmentation - **Semantic Segmentation:** Classifying every pixel in an image into a predefined category. [Source](https://en.wikipedia.org/wiki/Image_segmentation) - **Instance Segmentation:** Detecting and segmenting each individual object instance in an image. [Source](https://en.wikipedia.org/wiki/Image_segmentation) - **Panoptic Segmentation:** Unifies semantic and instance segmentation, labeling every pixel with both a class and an instance ID. [Source](https://arxiv.org/abs/1801.00868) - **U-Net:** An encoder-decoder architecture with skip connections designed for biomedical image segmentation. [Source](https://en.wikipedia.org/wiki/U-Net) - **Mask R-CNN:** Extends Faster R-CNN by adding a branch for predicting segmentation masks for each detected object. [Source](https://arxiv.org/abs/1703.06870) - **DeepLab:** A family of models using atrous (dilated) convolutions and conditional random fields for dense semantic segmentation. [Source](https://arxiv.org/abs/1606.00915) - **Segment Anything Model (SAM):** A foundation model for promptable image segmentation trained on over 1 billion masks. [Source](https://arxiv.org/abs/2304.02643) - **FCN (Fully Convolutional Network):** The first model to use only convolutional layers for end-to-end pixel-wise prediction. [Source](https://arxiv.org/abs/1411.4038) ### 5.4 Image Generation & Synthesis - **Stable Diffusion:** An open-source latent diffusion model for high-quality text-to-image generation. [Source](https://en.wikipedia.org/wiki/Stable_Diffusion) - **DALL-E:** OpenAI's model series that generates images from text descriptions using diffusion or autoregressive methods. [Source](https://en.wikipedia.org/wiki/DALL-E) - **Midjourney:** An AI image generation service known for producing highly aesthetic and artistic outputs. [Source](https://en.wikipedia.org/wiki/Midjourney) - **Neural Style Transfer:** Applies the artistic style of one image to the content of another using CNN feature representations. [Source](https://en.wikipedia.org/wiki/Neural_style_transfer) - **Image Super-Resolution:** Reconstructing high-resolution images from low-resolution inputs using deep learning. [Source](https://en.wikipedia.org/wiki/Super-resolution_imaging) - **Image Inpainting:** Filling in missing or damaged regions of an image with plausible content. [Source](https://en.wikipedia.org/wiki/Inpainting) - **Image-to-Image Translation:** Converting an image from one visual domain to another (e.g., sketch to photo, day to night). [Source](https://en.wikipedia.org/wiki/Image-to-image_translation) - **ControlNet:** Adds spatial conditioning controls (edges, depth, pose) to diffusion models for guided image generation. [Source](https://arxiv.org/abs/2302.05543) ### 5.5 Video Understanding - **Video Classification:** Assigning activity or event labels to video clips. [Source](https://en.wikipedia.org/wiki/Activity_recognition) - **Action Recognition:** Identifying human actions or activities in video sequences. [Source](https://en.wikipedia.org/wiki/Activity_recognition) - **Video Object Tracking:** Following specific objects across video frames over time. [Source](https://en.wikipedia.org/wiki/Video_tracking) - **Optical Flow:** Estimating the apparent motion of objects between consecutive video frames. [Source](https://en.wikipedia.org/wiki/Optical_flow) - **Video Generation:** Synthesizing realistic video sequences from text, images, or other conditioning signals. [Source](https://en.wikipedia.org/wiki/Video_generation) - **3D Convolutions (C3D, I3D):** Extend 2D convolutions to the temporal dimension for spatiotemporal feature learning from video. [Source](https://arxiv.org/abs/1705.07750) - **Video Transformers (TimeSformer, ViViT):** Apply transformer architectures to video by processing spatiotemporal token sequences. [Source](https://arxiv.org/abs/2103.15691) ### 5.6 3D Vision - **Depth Estimation:** Predicting the distance of each pixel from the camera, from monocular images or stereo pairs. [Source](https://en.wikipedia.org/wiki/Depth_perception#Computer_vision) - **Point Cloud Processing:** Analyzing 3D point cloud data from LiDAR or depth sensors using networks like PointNet. [Source](https://en.wikipedia.org/wiki/Point_cloud) - **PointNet:** A pioneering architecture that directly processes unordered 3D point sets for classification and segmentation. [Source](https://arxiv.org/abs/1612.00593) - **Neural Radiance Fields (NeRF):** Represents 3D scenes as continuous functions that map coordinates to color and density for novel view synthesis. [Source](https://en.wikipedia.org/wiki/Neural_radiance_field) - **3D Gaussian Splatting:** Represents scenes as collections of 3D Gaussians for real-time high-quality novel view synthesis. [Source](https://arxiv.org/abs/2308.14737) - **Simultaneous Localization and Mapping (SLAM):** Algorithms that build a map of an unknown environment while tracking the agent's location. [Source](https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping) - **Stereo Vision:** Estimating depth by comparing images from two cameras separated by a known baseline. [Source](https://en.wikipedia.org/wiki/Computer_stereo_vision) - **3D Reconstruction:** Building 3D models of objects or scenes from 2D images or depth data. [Source](https://en.wikipedia.org/wiki/3D_reconstruction) ### 5.7 Face Analysis - **Face Detection:** Locating human faces in images or video. [Source](https://en.wikipedia.org/wiki/Face_detection) - **Face Recognition:** Identifying or verifying individuals from their facial features. [Source](https://en.wikipedia.org/wiki/Facial_recognition_system) - **Facial Landmark Detection:** Locating key points on a face (eyes, nose, mouth corners) for alignment and analysis. [Source](https://en.wikipedia.org/wiki/Facial_recognition_system) - **Face Generation/Deepfakes:** AI-generated synthetic face images or face-swapped videos, raising ethical and misinformation concerns. [Source](https://en.wikipedia.org/wiki/Deepfake) - **Facial Expression Recognition:** Classifying the emotional expression displayed on a face (happy, sad, angry, etc.). [Source](https://en.wikipedia.org/wiki/Facial_expression#Recognition) ### 5.8 Pose Estimation - **Human Pose Estimation:** Detecting the positions of body joints (keypoints) in images or video. [Source](https://en.wikipedia.org/wiki/Pose_\(computer_vision\)) - **OpenPose:** A real-time multi-person system for detecting body, face, hand, and foot keypoints. [Source](https://github.com/CMU-Perceptual-Computing-Lab/openpose) - **Hand Pose Estimation:** Detecting the positions of hand joints and finger keypoints for gesture recognition. [Source](https://en.wikipedia.org/wiki/Gesture_recognition) --- ## 6. Speech & Audio Processing ### 6.1 Automatic Speech Recognition (ASR) - **Automatic Speech Recognition:** Converting spoken language into text. [Source](https://en.wikipedia.org/wiki/Speech_recognition) - **Whisper:** OpenAI's robust ASR model trained on 680,000 hours of multilingual web audio, achieving strong zero-shot performance. [Source](https://arxiv.org/abs/2212.04356) - **CTC (Connectionist Temporal Classification):** A loss function for sequence-to-sequence tasks that handles alignment between input and output without requiring pre-alignment. [Source](https://en.wikipedia.org/wiki/Connectionist_temporal_classification) - **End-to-End ASR:** Models that directly map audio to text without intermediate pipeline stages. [Source](https://en.wikipedia.org/wiki/Speech_recognition) - **Speaker Diarization:** Determining "who spoke when" in a multi-speaker audio recording. [Source](https://en.wikipedia.org/wiki/Speaker_diarisation) - **Speaker Verification/Identification:** Confirming or determining a speaker's identity from their voice. [Source](https://en.wikipedia.org/wiki/Speaker_recognition) - **Mel-Frequency Cepstral Coefficients (MFCCs):** A compact representation of the short-term power spectrum of sound, widely used as audio features. [Source](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) - **Spectrogram:** A visual representation of the spectrum of frequencies in a signal as they vary with time. [Source](https://en.wikipedia.org/wiki/Spectrogram) ### 6.2 Text-to-Speech (TTS) - **Text-to-Speech Synthesis:** Generating natural-sounding speech audio from input text. [Source](https://en.wikipedia.org/wiki/Speech_synthesis) - **Tacotron:** A sequence-to-sequence model that generates mel spectrograms from text, producing natural-sounding speech. [Source](https://arxiv.org/abs/1703.10135) - **WaveNet:** A deep generative model for raw audio waveforms that produces highly natural speech. [Source](https://en.wikipedia.org/wiki/WaveNet) - **VITS:** An end-to-end TTS model combining variational inference, normalizing flows, and adversarial training for high-quality speech. [Source](https://arxiv.org/abs/2106.06103) - **Voice Cloning:** Synthesizing speech in a specific person's voice using only a few samples of their speech. [Source](https://en.wikipedia.org/wiki/Speech_synthesis#Voice_cloning) - **Neural Vocoders:** Neural networks (HiFi-GAN, WaveGlow) that convert spectrograms to high-fidelity audio waveforms. [Source](https://arxiv.org/abs/2010.05646) ### 6.3 Music & Audio Generation - **Music Generation:** AI systems that compose original music, using architectures like transformers and diffusion models. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_and_music) - **Audio Classification:** Identifying sounds, environmental events, or musical instruments from audio signals. [Source](https://en.wikipedia.org/wiki/Audio_signal_processing) - **Music Information Retrieval:** Extracting meaningful information from music (genre, tempo, key, mood, instruments). [Source](https://en.wikipedia.org/wiki/Music_information_retrieval) - **Source Separation:** Isolating individual sound sources (vocals, drums, bass) from a mixed audio signal. [Source](https://en.wikipedia.org/wiki/Source_separation) --- ## 7. Multimodal AI ### 7.1 Vision-Language Models - **Vision-Language Model:** Models that jointly understand and reason about both images and text. [Source](https://en.wikipedia.org/wiki/Vision%E2%80%93language_model) - **CLIP (Contrastive Language–Image Pre-training):** Learns to associate images with text descriptions via contrastive learning on 400M image-text pairs. [Source](https://arxiv.org/abs/2103.00020) - **BLIP-2:** A vision-language model that bridges frozen image encoders and language models via a lightweight querying transformer. [Source](https://arxiv.org/abs/2301.12597) - **LLaVA (Large Language and Vision Assistant):** Connects a vision encoder to a language model for visual instruction following. [Source](https://arxiv.org/abs/2304.08485) - **Flamingo:** DeepMind's few-shot multimodal model that interleaves visual and textual inputs for in-context learning. [Source](https://arxiv.org/abs/2204.14198) - **Image Captioning:** Generating natural language descriptions of image content. [Source](https://en.wikipedia.org/wiki/Automatic_image_annotation) - **Visual Question Answering (VQA):** Answering natural language questions about the content of an image. [Source](https://en.wikipedia.org/wiki/Visual_question_answering) - **Visual Grounding:** Localizing the region in an image that corresponds to a natural language expression. [Source](https://en.wikipedia.org/wiki/Visual_grounding) - **Text-to-Image Generation:** Generating images from text descriptions using models like DALL-E, Stable Diffusion, and Imagen. [Source](https://en.wikipedia.org/wiki/Text-to-image_model) - **Multimodal Embeddings:** Shared embedding spaces where images, text, audio, and other modalities can be compared directly. [Source](https://en.wikipedia.org/wiki/Multimodal_learning) ### 7.2 Video-Language Models - **Video Captioning:** Generating natural language descriptions of video content. [Source](https://en.wikipedia.org/wiki/Video_captioning) - **Text-to-Video Generation:** Synthesizing video from text descriptions, an active frontier of generative AI. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_art#Video) - **Video Question Answering:** Answering questions about the content and events in a video. [Source](https://en.wikipedia.org/wiki/Visual_question_answering) ### 7.3 Audio-Visual Models - **Audio-Visual Speech Recognition:** Combining lip movements and audio for more robust speech recognition. [Source](https://en.wikipedia.org/wiki/Audio-visual_speech_recognition) - **Audio-Visual Source Separation:** Using visual cues (e.g., seeing who is speaking) to separate audio sources. [Source](https://en.wikipedia.org/wiki/Source_separation) ### 7.4 Document Understanding - **Document Understanding:** AI systems that extract and reason about information from visually rich documents (forms, receipts, tables). [Source](https://en.wikipedia.org/wiki/Document_processing) - **Table Extraction:** Detecting and parsing tabular data from documents and images. [Source](https://en.wikipedia.org/wiki/Table_\(information\)) - **Layout Analysis:** Understanding the spatial structure of document pages (headings, paragraphs, figures, tables). [Source](https://en.wikipedia.org/wiki/Document_layout_analysis) --- ## 8. Large Language Models & Foundation Models ### 8.1 Scaling Laws & Emergent Abilities - **Scaling Laws:** Empirical power-law relationships between model performance and compute, data, and parameter count. [Source](https://arxiv.org/abs/2001.08361) - **Chinchilla Scaling Laws:** Showed that models should be trained on roughly 20 tokens per parameter for compute-optimal training. [Source](https://arxiv.org/abs/2203.15556) - **Emergent Abilities:** Capabilities (e.g., chain-of-thought reasoning) that appear only when models reach sufficient scale. [Source](https://arxiv.org/abs/2206.07682) - **Phase Transitions in LLMs:** Sudden performance jumps on specific tasks as models cross scale thresholds. [Source](https://arxiv.org/abs/2206.07682) - **Foundation Models:** Large models pre-trained on broad data that can be adapted to many downstream tasks. [Source](https://en.wikipedia.org/wiki/Foundation_model) - **Compute-Optimal Training:** Determining the best allocation of a fixed compute budget between model size and training data. [Source](https://arxiv.org/abs/2203.15556) ### 8.2 LLM Training & Fine-Tuning - **Pre-training:** Training a model on a large corpus of text (or other data) using self-supervised objectives before task-specific adaptation. [Source](https://en.wikipedia.org/wiki/Pre-training) - **Supervised Fine-Tuning (SFT):** Adapting a pre-trained model on curated instruction-response pairs to follow instructions. [Source](https://en.wikipedia.org/wiki/Fine-tuning_\(deep_learning\)) - **Reinforcement Learning from Human Feedback (RLHF):** Training models using a reward model derived from human preference rankings to align outputs with human values. [Source](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback) - **Constitutional AI (CAI):** Anthropic's approach where the model critiques and revises its own outputs according to a set of principles. [Source](https://arxiv.org/abs/2212.08073) - **Direct Preference Optimization (DPO):** Aligns models directly from preference pairs without training a separate reward model. [Source](https://arxiv.org/abs/2305.18290) - **Parameter-Efficient Fine-Tuning (PEFT):** Methods that update only a small fraction of model parameters for efficient adaptation. [Source](https://en.wikipedia.org/wiki/Fine-tuning_\(deep_learning\)) - **LoRA (Low-Rank Adaptation):** Inserts trainable low-rank matrices into frozen model layers, enabling efficient fine-tuning with minimal parameters. [Source](https://arxiv.org/abs/2106.09685) - **QLoRA:** Combines quantization with LoRA to fine-tune large models on consumer hardware with minimal memory. [Source](https://arxiv.org/abs/2305.14314) - **Prefix Tuning:** Prepends learnable continuous vectors to the input, adapting the model without changing its weights. [Source](https://arxiv.org/abs/2101.00190) - **Adapter Layers:** Small trainable modules inserted between frozen transformer layers for efficient multi-task adaptation. [Source](https://arxiv.org/abs/1902.00751) - **Instruction Tuning:** Fine-tuning on a diverse set of tasks framed as instructions to improve the model's ability to follow user directions. [Source](https://arxiv.org/abs/2109.01652) ### 8.3 LLM Inference Optimization - **Quantization:** Reducing the numerical precision of model weights (e.g., FP32→INT8→INT4) to decrease memory and speed up inference. [Source](https://en.wikipedia.org/wiki/Quantization_\(signal_processing\)) - **GPTQ:** A post-training quantization method that uses approximate second-order information to quantize LLMs to 3-4 bits. [Source](https://arxiv.org/abs/2210.17323) - **AWQ (Activation-Aware Weight Quantization):** Quantizes weights based on which are most important for activations, preserving critical channels. [Source](https://arxiv.org/abs/2306.00978) - **Pruning:** Removing redundant weights or structures from a model to reduce size and computation. [Source](https://en.wikipedia.org/wiki/Pruning_\(artificial_neural_network\)) - **Knowledge Distillation for LLMs:** Training a smaller model to mimic a larger model's behavior, compressing capabilities into a more efficient form. [Source](https://en.wikipedia.org/wiki/Knowledge_distillation) - **KV Cache Optimization:** Techniques to reduce memory consumption of the key-value cache during autoregressive generation. [Source](https://en.wikipedia.org/wiki/Transformer_\(deep_learning_architecture\)) - **Continuous Batching:** Dynamically batching requests to maximize GPU utilization during inference serving. [Source](https://www.anyscale.com/blog/continuous-batching-llm-inference) - **Tensor Parallelism:** Splitting individual tensors across multiple GPUs to handle models too large for a single device. [Source](https://en.wikipedia.org/wiki/Model_parallelism) - **Pipeline Parallelism:** Distributing different model layers across multiple GPUs in a pipeline. [Source](https://en.wikipedia.org/wiki/Model_parallelism) - **vLLM:** An open-source inference engine using PagedAttention for efficient memory management during LLM serving. [Source](https://github.com/vllm-project/vllm) ### 8.4 Prompting & In-Context Learning - **In-Context Learning (ICL):** The ability of LLMs to learn tasks from examples provided in the prompt without weight updates. [Source](https://arxiv.org/abs/2005.14165) - **Zero-Shot Prompting:** Instructing a model to perform a task with no examples, relying entirely on the instruction. [Source](https://en.wikipedia.org/wiki/Zero-shot_learning) - **Few-Shot Prompting:** Providing a small number of input-output examples in the prompt to guide the model's behavior. [Source](https://arxiv.org/abs/2005.14165) - **Chain-of-Thought (CoT) Prompting:** Encouraging models to reason step-by-step before arriving at an answer, improving accuracy on complex tasks. [Source](https://arxiv.org/abs/2201.11903) - **Tree of Thought:** Explores multiple reasoning paths in a tree structure, allowing backtracking and evaluation of alternatives. [Source](https://arxiv.org/abs/2305.10601) - **Self-Consistency:** Samples multiple chain-of-thought paths and takes a majority vote for more robust reasoning. [Source](https://arxiv.org/abs/2203.11171) - **Prompt Engineering:** The practice of designing and optimizing prompts to elicit desired behaviors from language models. [Source](https://en.wikipedia.org/wiki/Prompt_engineering) - **System Prompts:** Instructions provided to the model that set its role, behavior, and constraints for a conversation. [Source](https://en.wikipedia.org/wiki/Prompt_engineering) - **Retrieval-Augmented Generation (RAG):** Augmenting LLM generation by retrieving and conditioning on relevant external documents. [Source](https://arxiv.org/abs/2005.11401) - **Prompt Injection:** An adversarial technique that inserts malicious instructions into prompts to override the model's intended behavior. [Source](https://en.wikipedia.org/wiki/Prompt_injection) - **Jailbreaking:** Attempts to circumvent safety guardrails of AI models through carefully crafted prompts. [Source](https://en.wikipedia.org/wiki/Jailbreaking_\(artificial_intelligence\)) ### 8.5 LLM Capabilities & Reasoning - **Reasoning in LLMs:** The ability of language models to perform logical, mathematical, and commonsense reasoning. [Source](https://arxiv.org/abs/2212.10403) - **Mathematical Reasoning:** LLM ability to solve math problems, improved by chain-of-thought and verification techniques. [Source](https://arxiv.org/abs/2110.14168) - **Code Generation:** LLM ability to write, debug, and explain code in multiple programming languages. [Source](https://en.wikipedia.org/wiki/Vibe_coding) - **Tool Use / Function Calling:** The ability of LLMs to invoke external tools, APIs, or functions to accomplish tasks beyond text generation. [Source](https://arxiv.org/abs/2302.04761) - **Hallucination:** When LLMs generate plausible-sounding but factually incorrect or fabricated information. [Source](https://en.wikipedia.org/wiki/Hallucination_\(artificial_intelligence\)) - **Grounding:** Connecting language model outputs to verifiable external sources to reduce hallucination. [Source](https://en.wikipedia.org/wiki/Grounding_\(linguistics\)) - **Context Window:** The maximum number of tokens a model can process in a single forward pass, determining how much text it can consider. [Source](https://en.wikipedia.org/wiki/Transformer_\(deep_learning_architecture\)) - **Long-Context Models:** Models with extended context windows (100K+ tokens) for processing entire books or codebases. [Source](https://arxiv.org/abs/2307.03172) - **Instruction Following:** The degree to which a model can faithfully execute complex, multi-part user instructions. [Source](https://arxiv.org/abs/2109.01652) - **Multilingual Capabilities:** The ability of LLMs to understand and generate text across many languages. [Source](https://en.wikipedia.org/wiki/Multilingualism) ### 8.6 LLM Evaluation & Benchmarks - **MMLU (Massive Multitask Language Understanding):** A benchmark testing knowledge across 57 academic subjects from STEM to humanities. [Source](https://arxiv.org/abs/2009.03300) - **HumanEval:** A benchmark for evaluating code generation ability by testing whether generated code passes unit tests. [Source](https://arxiv.org/abs/2107.03374) - **GSM8K:** A benchmark of 8,500 grade-school math word problems testing multi-step mathematical reasoning. [Source](https://arxiv.org/abs/2110.14168) - **HellaSwag:** A benchmark for commonsense reasoning about everyday physical situations. [Source](https://arxiv.org/abs/1905.07830) - **TruthfulQA:** A benchmark measuring whether a language model generates truthful answers to questions designed to elicit falsehoods. [Source](https://arxiv.org/abs/2109.07958) - **GPQA:** Graduate-level science questions that test deep domain expertise and reasoning. [Source](https://arxiv.org/abs/2311.12022) - **ARC (AI2 Reasoning Challenge):** Science exam questions requiring reasoning beyond retrieval. [Source](https://arxiv.org/abs/1803.05457) - **Chatbot Arena / ELO Rankings:** A crowdsourced platform where users compare LLM outputs pairwise to create ELO-based rankings. [Source](https://chat.lmsys.org/) - **SWE-bench:** A benchmark evaluating LLMs on real-world software engineering tasks from GitHub issues. [Source](https://arxiv.org/abs/2310.06770) - **MATH Benchmark:** A collection of 12,500 challenging competition mathematics problems for evaluating mathematical reasoning. [Source](https://arxiv.org/abs/2103.03874) --- ## 9. AI Agents & Autonomous Systems ### 9.1 LLM-Based Agents - **AI Agent:** An autonomous system that perceives its environment, reasons about it, and takes actions to achieve goals. [Source](https://en.wikipedia.org/wiki/Intelligent_agent) - **ReAct (Reasoning + Acting):** A framework where LLMs interleave reasoning traces with action execution for task solving. [Source](https://arxiv.org/abs/2210.03629) - **Tool-Augmented LLMs:** Language models enhanced with the ability to call external tools (search, calculators, code execution). [Source](https://arxiv.org/abs/2302.04761) - **AutoGPT:** An autonomous agent that uses GPT-4 to chain together thoughts and actions to accomplish user-defined goals. [Source](https://en.wikipedia.org/wiki/AutoGPT) - **Planning in LLM Agents:** The ability of AI agents to decompose complex goals into sequences of executable steps. [Source](https://arxiv.org/abs/2305.04091) - **Memory in AI Agents:** Mechanisms for agents to store and retrieve information across interactions (short-term and long-term memory). [Source](https://arxiv.org/abs/2304.03442) - **Multi-Agent Systems:** Architectures where multiple AI agents collaborate, debate, or specialize to solve complex tasks. [Source](https://en.wikipedia.org/wiki/Multi-agent_system) - **Model Context Protocol (MCP):** A standardized protocol for connecting AI models to external data sources and tools. [Source](https://en.wikipedia.org/wiki/Model_Context_Protocol) - **Computer Use Agents:** AI systems that can observe and interact with computer interfaces (clicking, typing, navigating). [Source](https://en.wikipedia.org/wiki/Computer_agent) - **Web Agents:** AI systems that can browse the web, fill forms, and complete online tasks autonomously. [Source](https://arxiv.org/abs/2307.12856) - **Coding Agents:** AI agents that can write, test, debug, and deploy code in development environments. [Source](https://en.wikipedia.org/wiki/Vibe_coding) ### 9.2 Robotics & Embodied AI - **Robotics AI:** The integration of AI perception, planning, and control for physical robots. [Source](https://en.wikipedia.org/wiki/Robotics) - **Robot Perception:** Using sensors (cameras, LiDAR, tactile) and AI to understand the robot's environment. [Source](https://en.wikipedia.org/wiki/Robotic_sensing) - **Motion Planning:** Computing collision-free paths for a robot to move from start to goal configurations. [Source](https://en.wikipedia.org/wiki/Motion_planning) - **Manipulation:** Robotic grasping, pushing, and dexterous handling of objects using learned or planned control. [Source](https://en.wikipedia.org/wiki/Robot_manipulation) - **Locomotion:** Controlling legged, wheeled, or flying robots to navigate through environments. [Source](https://en.wikipedia.org/wiki/Robot_locomotion) - **Sim-to-Real Transfer:** Training policies in simulation and deploying them on real robots, bridging the reality gap. [Source](https://en.wikipedia.org/wiki/Sim-to-real) - **Imitation Learning for Robotics:** Robots learning behaviors by observing demonstrations from humans or other robots. [Source](https://en.wikipedia.org/wiki/Robot_learning) - **Foundation Models for Robotics:** Large pre-trained models adapted for robotic perception, planning, and control. [Source](https://arxiv.org/abs/2312.07843) - **Human-Robot Interaction (HRI):** The study of how humans and robots communicate and collaborate. [Source](https://en.wikipedia.org/wiki/Human%E2%80%93robot_interaction) ### 9.3 Autonomous Vehicles - **Self-Driving Cars:** Vehicles that use AI for perception, planning, and control to navigate without human intervention. [Source](https://en.wikipedia.org/wiki/Self-driving_car) - **Levels of Autonomy (SAE 0–5):** A classification system ranging from no automation (Level 0) to full autonomy (Level 5). [Source](https://en.wikipedia.org/wiki/Self-driving_car#Levels_of_driving_automation) - **Perception Stack (AV):** The sensory processing pipeline (cameras, LiDAR, radar) that builds an understanding of the driving environment. [Source](https://en.wikipedia.org/wiki/Self-driving_car) - **Path Planning (AV):** Computing safe and efficient routes and trajectories for autonomous vehicles. [Source](https://en.wikipedia.org/wiki/Motion_planning) - **Sensor Fusion:** Combining data from multiple sensors to produce more accurate and reliable environmental perception. [Source](https://en.wikipedia.org/wiki/Sensor_fusion) - **End-to-End Driving:** Learning to drive directly from sensor inputs to control outputs using a single neural network. [Source](https://arxiv.org/abs/2003.06404) ### 9.4 Game AI - **Game-Playing AI:** AI systems designed to play games at superhuman levels. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_in_video_games) - **AlphaZero:** Mastered chess, shogi, and Go from self-play alone, without human knowledge beyond the rules. [Source](https://en.wikipedia.org/wiki/AlphaZero) - **OpenAI Five:** An AI system that defeated professional teams in the complex multiplayer game Dota 2. [Source](https://en.wikipedia.org/wiki/OpenAI_Five) - **AlphaStar:** DeepMind's AI that reached Grandmaster level in StarCraft II, a complex real-time strategy game. [Source](https://en.wikipedia.org/wiki/AlphaStar) - **Procedural Content Generation:** Using AI to automatically create game levels, maps, narratives, and other content. [Source](https://en.wikipedia.org/wiki/Procedural_generation) - **NPC AI:** Artificial intelligence controlling non-player characters to create believable and engaging game experiences. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_in_video_games) --- ## 10. AI for Science & Domain Applications ### 10.1 AI for Biology & Medicine - **AlphaFold:** DeepMind's system that predicts protein 3D structures from amino acid sequences with atomic accuracy. [Source](https://en.wikipedia.org/wiki/AlphaFold) - **Drug Discovery with AI:** Using ML to identify drug candidates, predict molecular properties, and optimize drug design. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_in_drug_discovery) - **Medical Image Analysis:** Applying deep learning to interpret X-rays, CT scans, MRIs, and pathology slides for diagnosis. [Source](https://en.wikipedia.org/wiki/Medical_image_computing) - **Genomics & AI:** Using ML to analyze DNA/RNA sequences for gene prediction, variant calling, and functional annotation. [Source](https://en.wikipedia.org/wiki/Genomics) - **Clinical NLP:** Extracting information from clinical notes, electronic health records, and medical literature using NLP. [Source](https://en.wikipedia.org/wiki/Clinical_natural_language_processing) - **Protein Design:** Using generative AI to design novel proteins with desired structures and functions. [Source](https://en.wikipedia.org/wiki/Protein_design) - **AlphaFold 3:** Extended structure prediction to complexes of proteins with DNA, RNA, ligands, and other molecules. [Source](https://en.wikipedia.org/wiki/AlphaFold) - **Molecular Generation:** Using generative models (VAEs, diffusion) to design novel molecules with desired chemical properties. [Source](https://en.wikipedia.org/wiki/De_novo_molecular_design) ### 10.2 AI for Physical Sciences - **AI for Weather Forecasting:** ML models (GraphCast, Pangu-Weather) that rival or surpass numerical weather prediction methods. [Source](https://en.wikipedia.org/wiki/Numerical_weather_prediction) - **AI for Physics Simulation:** Using neural networks to accelerate or replace traditional physics simulations (fluid dynamics, molecular dynamics). [Source](https://en.wikipedia.org/wiki/Physics-informed_neural_networks) - **Physics-Informed Neural Networks (PINNs):** Neural networks that incorporate physical laws (PDEs) as constraints during training. [Source](https://en.wikipedia.org/wiki/Physics-informed_neural_networks) - **Neural Operators:** Learn mappings between function spaces for solving PDEs, generalizing across different initial/boundary conditions. [Source](https://arxiv.org/abs/2108.08481) - **AI for Materials Science:** Using ML to predict material properties, discover new materials, and accelerate materials design. [Source](https://en.wikipedia.org/wiki/Materials_informatics) - **AI for Astronomy:** ML applications in galaxy classification, exoplanet detection, gravitational wave analysis, and cosmological simulations. [Source](https://en.wikipedia.org/wiki/Astroinformatics) ### 10.3 AI for Mathematics - **AI for Theorem Proving:** Using ML to assist or automate mathematical proof discovery and verification. [Source](https://en.wikipedia.org/wiki/Automated_theorem_proving) - **AI for Conjecture Generation:** Using AI to identify mathematical patterns and propose new conjectures. [Source](https://en.wikipedia.org/wiki/Automated_theorem_proving) - **Formal Verification with AI:** Using ML to guide formal proof assistants (Lean, Coq, Isabelle) in constructing correct proofs. [Source](https://en.wikipedia.org/wiki/Proof_assistant) - **AlphaGeometry:** DeepMind's system that solves Olympiad-level geometry problems using a neural-symbolic approach. [Source](https://en.wikipedia.org/wiki/AlphaGeometry) ### 10.4 AI for Finance - **Algorithmic Trading:** Using AI and ML models to make automated trading decisions at speed. [Source](https://en.wikipedia.org/wiki/Algorithmic_trading) - **Credit Scoring:** Using ML to assess the creditworthiness of loan applicants. [Source](https://en.wikipedia.org/wiki/Credit_score) - **Fraud Detection:** Identifying fraudulent transactions using anomaly detection and classification models. [Source](https://en.wikipedia.org/wiki/Data_analysis_techniques_for_fraud_detection) - **Financial NLP:** Analyzing financial news, earnings calls, and SEC filings for sentiment and information extraction. [Source](https://en.wikipedia.org/wiki/Natural_language_processing) - **Risk Modeling:** Using ML to estimate and manage financial risks (market, credit, operational). [Source](https://en.wikipedia.org/wiki/Financial_risk_modeling) - **Portfolio Optimization:** Using AI to allocate assets across investments to maximize return for a given risk level. [Source](https://en.wikipedia.org/wiki/Portfolio_optimization) ### 10.5 AI for Education - **Intelligent Tutoring Systems:** AI systems that provide personalized instruction adapted to each student's knowledge and learning pace. [Source](https://en.wikipedia.org/wiki/Intelligent_tutoring_system) - **Automated Essay Scoring:** Using NLP to automatically grade written essays and provide feedback. [Source](https://en.wikipedia.org/wiki/Automated_essay_scoring) - **Adaptive Learning Platforms:** Educational systems that adjust content difficulty and sequencing based on student performance. [Source](https://en.wikipedia.org/wiki/Adaptive_learning) - **AI-Assisted Question Generation:** Automatically creating quiz and exam questions from educational content. [Source](https://en.wikipedia.org/wiki/Question_generation) ### 10.6 AI for Law - **Legal AI / LegalTech:** AI systems for contract analysis, legal research, case prediction, and document review. [Source](https://en.wikipedia.org/wiki/Legal_technology) - **Contract Analysis:** Using NLP to extract key clauses, obligations, and risks from legal contracts. [Source](https://en.wikipedia.org/wiki/Contract_management) - **Predictive Justice:** ML models that predict case outcomes based on historical judicial data (with significant ethical debate). [Source](https://en.wikipedia.org/wiki/Predictive_analytics) - **E-Discovery:** Using AI to search, identify, and produce relevant electronic documents in legal proceedings. [Source](https://en.wikipedia.org/wiki/Electronic_discovery) ### 10.7 AI for Climate & Environment - **Climate Modeling with AI:** Using ML to improve climate projections, downscale models, and emulate expensive simulations. [Source](https://en.wikipedia.org/wiki/Climate_model) - **AI for Energy Optimization:** Optimizing energy grids, building efficiency, and renewable energy forecasting with ML. [Source](https://en.wikipedia.org/wiki/Smart_grid) - **AI for Biodiversity Monitoring:** Using computer vision and audio AI to identify and track species from camera traps and acoustic sensors. [Source](https://en.wikipedia.org/wiki/Biodiversity) - **Satellite Image Analysis:** Using deep learning to analyze remote sensing data for land use, deforestation, and disaster response. [Source](https://en.wikipedia.org/wiki/Remote_sensing) --- ## 11. AI Safety, Alignment & Ethics ### 11.1 AI Alignment - **AI Alignment:** The challenge of ensuring AI systems pursue goals that are beneficial to humans and consistent with human values. [Source](https://en.wikipedia.org/wiki/AI_alignment) - **Outer Alignment:** Ensuring the specified training objective accurately captures what humans actually want. [Source](https://en.wikipedia.org/wiki/AI_alignment) - **Inner Alignment:** Ensuring the model's learned objective (mesa-objective) matches the training objective. [Source](https://arxiv.org/abs/1906.01820) - **Mesa-Optimization:** When a model internally develops its own optimization process that may diverge from the training objective. [Source](https://arxiv.org/abs/1906.01820) - **Reward Hacking:** When an agent finds unintended ways to maximize its reward signal without achieving the intended goal. [Source](https://en.wikipedia.org/wiki/Reward_hacking) - **Goodhart's Law in AI:** When a measure becomes a target, it ceases to be a good measure—agents optimize proxies in unintended ways. [Source](https://en.wikipedia.org/wiki/Goodhart%27s_law) - **Scalable Oversight:** Developing methods for humans to supervise AI systems that become increasingly capable and complex. [Source](https://en.wikipedia.org/wiki/AI_alignment) - **Debate (Alignment Method):** Two AI systems argue opposing sides while a human judge evaluates, theoretically scaling oversight. [Source](https://arxiv.org/abs/1805.00899) - **Iterated Amplification:** Recursively bootstraps human oversight by decomposing tasks into simpler sub-tasks that humans can supervise. [Source](https://arxiv.org/abs/1810.08575) - **Constitutional AI:** Training AI with a set of principles (a "constitution") that guides self-critique and revision of outputs. [Source](https://arxiv.org/abs/2212.08073) - **Corrigibility:** The property of an AI system that allows humans to correct, modify, or shut it down without resistance. [Source](https://en.wikipedia.org/wiki/AI_alignment#Corrigibility) - **Value Learning:** Approaches where AI systems infer human values from behavior, preferences, or other signals. [Source](https://en.wikipedia.org/wiki/AI_alignment) - **Cooperative Inverse Reinforcement Learning (CIRL):** A formalism where the human and AI are on the same team, and the AI must infer the human's reward function. [Source](https://arxiv.org/abs/1606.03137) ### 11.2 AI Safety - **AI Safety:** The field focused on preventing AI systems from causing unintended harm, encompassing technical and governance approaches. [Source](https://en.wikipedia.org/wiki/AI_safety) - **Robustness:** The ability of AI systems to maintain performance under distribution shift, adversarial inputs, or novel conditions. [Source](https://en.wikipedia.org/wiki/Robustness_\(computer_science\)) - **Adversarial Examples:** Small, carefully crafted perturbations to inputs that cause models to make incorrect predictions. [Source](https://en.wikipedia.org/wiki/Adversarial_machine_learning) - **Adversarial Training:** Training on adversarial examples to improve model robustness against such attacks. [Source](https://en.wikipedia.org/wiki/Adversarial_machine_learning) - **Distribution Shift / Out-of-Distribution Detection:** Identifying when inputs differ significantly from the training distribution. [Source](https://en.wikipedia.org/wiki/Dataset_shift) - **Uncertainty Quantification:** Methods for models to express confidence levels, distinguishing what they know from what they don't. [Source](https://en.wikipedia.org/wiki/Uncertainty_quantification) - **Calibration:** Ensuring a model's predicted probabilities accurately reflect the true likelihood of outcomes. [Source](https://en.wikipedia.org/wiki/Calibration_\(statistics\)) - **Red Teaming:** Systematically probing AI systems for vulnerabilities, failure modes, and harmful outputs. [Source](https://en.wikipedia.org/wiki/Red_team) - **AI Sandboxing:** Running AI systems in contained environments where their actions cannot cause real-world harm. [Source](https://en.wikipedia.org/wiki/Sandbox_\(computer_security\)) - **Catastrophic Forgetting:** When a neural network loses previously learned knowledge upon learning new information. [Source](https://en.wikipedia.org/wiki/Catastrophic_interference) - **Specification Gaming:** When AI systems exploit loopholes in their objective specification to achieve high reward without intended behavior. [Source](https://en.wikipedia.org/wiki/Specification_gaming) ### 11.3 Fairness & Bias - **Algorithmic Fairness:** Ensuring AI systems make decisions that do not discriminate against protected groups. [Source](https://en.wikipedia.org/wiki/Fairness_\(machine_learning\)) - **Bias in AI:** Systematic errors in AI predictions that disadvantage certain groups, arising from data, design, or deployment. [Source](https://en.wikipedia.org/wiki/Algorithmic_bias) - **Demographic Parity:** A fairness criterion requiring equal positive prediction rates across demographic groups. [Source](https://en.wikipedia.org/wiki/Fairness_\(machine_learning\)) - **Equalized Odds:** A fairness criterion requiring equal true positive and false positive rates across groups. [Source](https://en.wikipedia.org/wiki/Equalized_odds) - **Counterfactual Fairness:** A decision is fair if it would remain the same had the individual belonged to a different demographic group. [Source](https://en.wikipedia.org/wiki/Fairness_\(machine_learning\)) - **Disparate Impact:** When a seemingly neutral policy disproportionately affects a protected group, even without explicit discrimination. [Source](https://en.wikipedia.org/wiki/Disparate_impact) - **Debiasing Techniques:** Methods for reducing bias in training data, word embeddings, or model outputs. [Source](https://en.wikipedia.org/wiki/Debiasing) - **Representational Harm:** When AI systems reinforce stereotypes or erase certain groups in their outputs. [Source](https://en.wikipedia.org/wiki/Algorithmic_bias) ### 11.4 Privacy & Security - **Differential Privacy:** A mathematical framework ensuring that individual data points cannot be identified from model outputs. [Source](https://en.wikipedia.org/wiki/Differential_privacy) - **Federated Learning:** Training models across decentralized devices without sharing raw data, preserving privacy. [Source](https://en.wikipedia.org/wiki/Federated_learning) - **Data Poisoning:** Adversarial attacks that corrupt training data to manipulate model behavior. [Source](https://en.wikipedia.org/wiki/Data_poisoning) - **Backdoor Attacks:** Inserting hidden triggers during training that cause targeted misclassification when activated at inference. [Source](https://en.wikipedia.org/wiki/Backdoor_\(computing\)) - **Model Extraction Attacks:** Stealing a model's functionality by querying it and training a surrogate on its outputs. [Source](https://en.wikipedia.org/wiki/Machine_learning#Security) - **Membership Inference Attacks:** Determining whether a specific data point was used in the model's training set. [Source](https://en.wikipedia.org/wiki/Machine_learning#Security) - **Homomorphic Encryption for ML:** Performing computations on encrypted data so the model never sees plaintext inputs. [Source](https://en.wikipedia.org/wiki/Homomorphic_encryption) - **Secure Multi-Party Computation:** Protocols allowing multiple parties to jointly compute a function over their inputs without revealing them. [Source](https://en.wikipedia.org/wiki/Secure_multi-party_computation) ### 11.5 Explainability & Transparency - **Explainable AI (XAI):** Making AI decision-making processes understandable and interpretable to humans. [Source](https://en.wikipedia.org/wiki/Explainable_artificial_intelligence) - **Model Cards:** Standardized documentation describing a model's intended use, performance characteristics, and limitations. [Source](https://arxiv.org/abs/1810.03993) - **Datasheets for Datasets:** Standardized documentation for datasets detailing provenance, composition, intended uses, and biases. [Source](https://arxiv.org/abs/1803.09010) - **Right to Explanation:** The legal principle that individuals affected by automated decisions have the right to understand how the decision was made. [Source](https://en.wikipedia.org/wiki/Right_to_explanation) - **Interpretable Models:** Models designed to be inherently understandable (decision trees, linear models, rule lists). [Source](https://en.wikipedia.org/wiki/Explainable_artificial_intelligence) - **Post-hoc Explanations:** Methods applied after training to explain black-box model predictions (LIME, SHAP, attention maps). [Source](https://en.wikipedia.org/wiki/Explainable_artificial_intelligence) ### 11.6 AI Governance & Regulation - **AI Governance:** Frameworks, policies, and institutions for managing the development and deployment of AI systems. [Source](https://en.wikipedia.org/wiki/Regulation_of_artificial_intelligence) - **EU AI Act:** The European Union's comprehensive AI regulation classifying AI systems by risk level and imposing requirements accordingly. [Source](https://en.wikipedia.org/wiki/Artificial_Intelligence_Act) - **NIST AI Risk Management Framework:** A U.S. framework providing guidelines for identifying, assessing, and mitigating AI risks. [Source](https://en.wikipedia.org/wiki/NIST_AI_Risk_Management_Framework) - **Responsible AI:** An umbrella term for principles and practices ensuring AI is developed and used ethically and beneficially. [Source](https://en.wikipedia.org/wiki/Responsible_artificial_intelligence) - **AI Auditing:** Systematically evaluating AI systems for compliance with ethical, legal, and technical standards. [Source](https://en.wikipedia.org/wiki/Algorithmic_auditing) - **Frontier Model Safety:** Safety practices specific to the most capable AI models, including evaluations and deployment guardrails. [Source](https://en.wikipedia.org/wiki/Frontier_model) - **Open Source vs. Closed Source AI:** The debate over whether releasing model weights publicly benefits or threatens AI safety. [Source](https://en.wikipedia.org/wiki/Open-source_artificial_intelligence) ### 11.7 Societal Impact - **AI and Employment:** The impact of automation and AI on labor markets, job displacement, and the creation of new roles. [Source](https://en.wikipedia.org/wiki/Technological_unemployment) - **AI and Misinformation:** The use of AI-generated text, images, and video to create and spread false information. [Source](https://en.wikipedia.org/wiki/Misinformation) - **Deepfakes and Trust:** Synthetic media that can undermine trust in authentic visual and audio evidence. [Source](https://en.wikipedia.org/wiki/Deepfake) - **Digital Divide:** The risk that AI benefits accrue primarily to those with access to technology, widening inequality. [Source](https://en.wikipedia.org/wiki/Digital_divide) - **Environmental Impact of AI:** The significant energy consumption and carbon footprint of training and running large AI models. [Source](https://en.wikipedia.org/wiki/Environmental_effects_of_artificial_intelligence) - **AI and Intellectual Property:** Legal questions about copyright, attribution, and ownership of AI-generated content and training data. [Source](https://en.wikipedia.org/wiki/Artificial_intelligence_and_copyright) - **Surveillance and AI:** The use of AI (facial recognition, behavior analysis) for mass surveillance and its privacy implications. [Source](https://en.wikipedia.org/wiki/Mass_surveillance) - **Autonomous Weapons:** AI-powered lethal autonomous weapon systems and the ethical debate over their development. [Source](https://en.wikipedia.org/wiki/Lethal_autonomous_weapon) --- ## 12. Classical AI & Symbolic Methods ### 12.1 Search Algorithms - **Search Problem:** Finding a sequence of actions from an initial state to a goal state in a state space. [Source](https://en.wikipedia.org/wiki/Search_algorithm) - **Breadth-First Search (BFS):** Explores all nodes at the current depth before moving to the next depth level, guaranteeing shortest path. [Source](https://en.wikipedia.org/wiki/Breadth-first_search) - **Depth-First Search (DFS):** Explores as far as possible along each branch before backtracking, using less memory than BFS. [Source](https://en.wikipedia.org/wiki/Depth-first_search) - __A_ Search:_* Uses a heuristic to guide search toward the goal, finding optimal paths when the heuristic is admissible. [Source](https://en.wikipedia.org/wiki/A*_search_algorithm) - **Dijkstra's Algorithm:** Finds shortest paths from a source to all other nodes in a weighted graph with non-negative weights. [Source](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) - **Iterative Deepening:** Combines the completeness of BFS with the memory efficiency of DFS by repeatedly deepening the search limit. [Source](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search) - **Minimax Algorithm:** A decision-making algorithm for two-player zero-sum games that minimizes the maximum possible loss. [Source](https://en.wikipedia.org/wiki/Minimax) - **Alpha-Beta Pruning:** An optimization of minimax that eliminates branches that cannot affect the final decision. [Source](https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning) - **Beam Search (as search):** A heuristic search that expands only the top-k most promising nodes at each level. [Source](https://en.wikipedia.org/wiki/Beam_search) - **Simulated Annealing:** A probabilistic optimization method inspired by metallurgical annealing that escapes local optima. [Source](https://en.wikipedia.org/wiki/Simulated_annealing) - **Genetic Algorithms:** Evolutionary optimization using selection, crossover, and mutation operators on a population of candidate solutions. [Source](https://en.wikipedia.org/wiki/Genetic_algorithm) - **Particle Swarm Optimization:** An optimization algorithm inspired by the social behavior of bird flocking or fish schooling. [Source](https://en.wikipedia.org/wiki/Particle_swarm_optimization) - **Ant Colony Optimization:** A probabilistic optimization technique inspired by the foraging behavior of ants using pheromone trails. [Source](https://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms) ### 12.2 Knowledge Representation & Reasoning - **Knowledge Representation:** The study of how to encode information about the world in a form that AI systems can use for reasoning. [Source](https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning) - **Ontology (AI):** A formal specification of concepts, categories, and relationships within a domain. [Source](https://en.wikipedia.org/wiki/Ontology_\(information_science\)) - **Knowledge Graph:** A structured representation of facts as entities and relationships, used for reasoning and question answering. [Source](https://en.wikipedia.org/wiki/Knowledge_graph) - **Semantic Web:** A vision for extending the web with machine-readable semantics using standards like RDF and OWL. [Source](https://en.wikipedia.org/wiki/Semantic_Web) - **First-Order Logic:** A formal logical system using quantifiers and predicates to represent and reason about the world. [Source](https://en.wikipedia.org/wiki/First-order_logic) - **Description Logic:** A family of formal knowledge representation languages used for defining ontologies and reasoning about them. [Source](https://en.wikipedia.org/wiki/Description_logic) - **Rule-Based Systems:** AI systems that apply a set of if-then rules to a knowledge base for inference and decision-making. [Source](https://en.wikipedia.org/wiki/Rule-based_system) - **Expert Systems:** AI programs that use rule-based knowledge from domain experts to solve specific problems. [Source](https://en.wikipedia.org/wiki/Expert_system) - **Frames (Knowledge Representation):** Data structures for representing stereotypical situations, introduced by Marvin Minsky. [Source](https://en.wikipedia.org/wiki/Frame_\(artificial_intelligence\)) - **Commonsense Reasoning:** The ability to make inferences about everyday situations using background knowledge most humans possess. [Source](https://en.wikipedia.org/wiki/Commonsense_reasoning) - **Non-Monotonic Reasoning:** Logical reasoning where conclusions can be retracted when new information is added. [Source](https://en.wikipedia.org/wiki/Non-monotonic_logic) - **Abductive Reasoning:** Inference to the best explanation, reasoning from observations to the most likely hypothesis. [Source](https://en.wikipedia.org/wiki/Abductive_reasoning) ### 12.3 Constraint Satisfaction & Planning - **Constraint Satisfaction Problem (CSP):** Finding values for variables that satisfy a set of constraints, used in scheduling, puzzles, and configuration. [Source](https://en.wikipedia.org/wiki/Constraint_satisfaction_problem) - **Backtracking Search (for CSP):** A systematic search that assigns values to variables and backtracks when a constraint is violated. [Source](https://en.wikipedia.org/wiki/Backtracking) - **Arc Consistency:** A constraint propagation technique that reduces variable domains by enforcing pairwise consistency. [Source](https://en.wikipedia.org/wiki/Arc_consistency) - **Boolean Satisfiability (SAT):** The problem of determining whether a Boolean formula can be satisfied, the canonical NP-complete problem. [Source](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem) - **Automated Planning:** Computing a sequence of actions to achieve a goal from an initial state in a defined environment. [Source](https://en.wikipedia.org/wiki/Automated_planning_and_scheduling) - **STRIPS:** An early planning representation language defining states, actions with preconditions and effects. [Source](https://en.wikipedia.org/wiki/Stanford_Research_Institute_Problem_Solver) - **PDDL (Planning Domain Definition Language):** A standardized language for defining planning domains and problems. [Source](https://en.wikipedia.org/wiki/Planning_Domain_Definition_Language) - **Hierarchical Task Networks (HTN):** A planning approach that decomposes high-level tasks into subtasks recursively. [Source](https://en.wikipedia.org/wiki/Hierarchical_task_network) ### 12.4 Logic Programming & Theorem Proving - **Logic Programming:** A programming paradigm based on formal logic, where programs are sets of logical statements. [Source](https://en.wikipedia.org/wiki/Logic_programming) - **Prolog:** A logic programming language widely used in AI for rule-based reasoning and symbolic computation. [Source](https://en.wikipedia.org/wiki/Prolog) - **Automated Theorem Proving:** Using computers to prove mathematical or logical theorems automatically. [Source](https://en.wikipedia.org/wiki/Automated_theorem_proving) - **Resolution (Logic):** A rule of inference that forms the basis of many automated theorem provers. [Source](https://en.wikipedia.org/wiki/Resolution_\(logic\)) - **Unification:** The process of finding a substitution that makes two logical expressions identical, fundamental to logic programming. [Source](https://en.wikipedia.org/wiki/Unification_\(computer_science\)) - **Inductive Logic Programming (ILP):** Learning logical rules from examples, combining machine learning with logic programming. [Source](https://en.wikipedia.org/wiki/Inductive_logic_programming) ### 12.5 Neuro-Symbolic AI - **Neuro-Symbolic AI:** An approach combining neural networks' learning ability with symbolic AI's reasoning capabilities. [Source](https://en.wikipedia.org/wiki/Neuro-symbolic_AI) - **Differentiable Programming:** Making discrete symbolic operations differentiable so they can be trained end-to-end with gradient descent. [Source](https://en.wikipedia.org/wiki/Differentiable_programming) - **Neural Theorem Provers:** Systems that use neural networks to guide symbolic theorem-proving search. [Source](https://arxiv.org/abs/2009.03393) - **Concept Bottleneck Models:** Neural models that predict human-interpretable concepts as intermediate steps before making final predictions. [Source](https://arxiv.org/abs/2007.04612) --- ## 13. Optimization & Evolutionary Methods ### 13.1 Gradient-Based Optimization - **Gradient Descent:** An iterative optimization algorithm that moves in the direction of steepest descent of the objective function. [Source](https://en.wikipedia.org/wiki/Gradient_descent) - **Mini-Batch Gradient Descent:** Computes gradients on small random subsets of data, balancing convergence speed and stability. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) - **Momentum:** Accumulates a moving average of past gradients to accelerate optimization and dampen oscillations. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum) - **Nesterov Accelerated Gradient:** A variant of momentum that looks ahead before computing the gradient for improved convergence. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Nesterov_momentum) - **AdaGrad:** Adapts learning rates per parameter based on historical gradient magnitudes, well-suited for sparse data. [Source](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad) - **Second-Order Methods (L-BFGS, Newton's):** Use curvature information (Hessian or approximations) for potentially faster convergence on smooth problems. [Source](https://en.wikipedia.org/wiki/Limited-memory_BFGS) - **Natural Gradient Descent:** Adjusts parameter updates based on the Fisher information matrix to account for the geometry of the parameter space. [Source](https://en.wikipedia.org/wiki/Natural_gradient_descent) ### 13.2 Evolutionary & Population-Based Methods - **Evolutionary Computation:** A family of optimization algorithms inspired by biological evolution (selection, mutation, crossover). [Source](https://en.wikipedia.org/wiki/Evolutionary_computation) - **Evolution Strategies:** Optimization by evolving a population of parameter vectors using Gaussian perturbations and selection. [Source](https://en.wikipedia.org/wiki/Evolution_strategy) - **Neuroevolution:** Evolving the weights, architectures, or hyperparameters of neural networks using evolutionary algorithms. [Source](https://en.wikipedia.org/wiki/Neuroevolution) - **NEAT (NeuroEvolution of Augmenting Topologies):** An algorithm that evolves both the topology and weights of neural networks simultaneously. [Source](https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies) - **Genetic Programming:** Evolves computer programs (often represented as syntax trees) to solve problems. [Source](https://en.wikipedia.org/wiki/Genetic_programming) - **Differential Evolution:** A population-based optimizer that uses differences between solution vectors for mutation. [Source](https://en.wikipedia.org/wiki/Differential_evolution) - **CMA-ES (Covariance Matrix Adaptation Evolution Strategy):** A sophisticated evolution strategy that adapts the covariance matrix of the search distribution. [Source](https://en.wikipedia.org/wiki/CMA-ES) - **Multi-Objective Optimization:** Finding trade-off solutions (Pareto front) when optimizing multiple conflicting objectives simultaneously. [Source](https://en.wikipedia.org/wiki/Multi-objective_optimization) --- ## 14. Data & Infrastructure ### 14.1 Data Management - **Data Collection:** Gathering data from various sources (web scraping, sensors, surveys, APIs) for training AI models. [Source](https://en.wikipedia.org/wiki/Data_collection) - **Data Labeling / Annotation:** The process of adding labels to data (bounding boxes, text categories, etc.) for supervised learning. [Source](https://en.wikipedia.org/wiki/Data_labeling) - **Active Learning:** A machine learning approach where the model queries the most informative examples for labeling to reduce annotation cost. [Source](https://en.wikipedia.org/wiki/Active_learning_\(machine_learning\)) - **Data Cleaning:** Identifying and correcting errors, inconsistencies, and noise in datasets. [Source](https://en.wikipedia.org/wiki/Data_cleansing) - **Data Versioning:** Tracking changes to datasets over time for reproducibility (tools like DVC). [Source](https://en.wikipedia.org/wiki/Version_control) - **Synthetic Data Generation:** Creating artificial data that mimics real data properties, useful when real data is scarce or sensitive. [Source](https://en.wikipedia.org/wiki/Synthetic_data) - **Data Imbalance:** When classes in a dataset are disproportionately represented, requiring techniques like oversampling or class weighting. [Source](https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis) - **Web Scraping for AI:** Programmatically extracting data from websites to build training datasets. [Source](https://en.wikipedia.org/wiki/Web_scraping) - **Crowdsourcing for AI:** Using platforms like Amazon Mechanical Turk to distribute data labeling tasks across many workers. [Source](https://en.wikipedia.org/wiki/Crowdsourcing) ### 14.2 Compute Infrastructure - **GPU Computing:** Using graphics processing units for parallel computation, essential for deep learning training and inference. [Source](https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units) - **TPU (Tensor Processing Unit):** Google's custom ASIC designed specifically for accelerating machine learning workloads. [Source](https://en.wikipedia.org/wiki/Tensor_Processing_Unit) - **CUDA:** NVIDIA's parallel computing platform and programming model for running code on GPUs. [Source](https://en.wikipedia.org/wiki/CUDA) - **Distributed Training:** Spreading model training across multiple GPUs or machines to handle large models and datasets. [Source](https://en.wikipedia.org/wiki/Distributed_computing) - **Data Parallelism:** Distributing different batches of data across multiple devices, each holding a copy of the model. [Source](https://en.wikipedia.org/wiki/Data_parallelism) - **Model Parallelism:** Splitting a model across multiple devices when it's too large to fit on a single one. [Source](https://en.wikipedia.org/wiki/Model_parallelism) - **Cloud Computing for AI:** Using cloud services (AWS, GCP, Azure) to access scalable compute resources for training and deployment. [Source](https://en.wikipedia.org/wiki/Cloud_computing) - **AI Accelerators:** Specialized hardware (GPUs, TPUs, FPGAs, custom ASICs) designed to speed up AI computations. [Source](https://en.wikipedia.org/wiki/AI_accelerator) - **Edge AI:** Running AI models on edge devices (phones, IoT) rather than in the cloud for low latency and privacy. [Source](https://en.wikipedia.org/wiki/Edge_computing) - **Neuromorphic Computing:** Computing architectures inspired by the brain's neural structure, using spiking neural networks and event-driven processing. [Source](https://en.wikipedia.org/wiki/Neuromorphic_engineering) - **Quantum Machine Learning:** Exploring quantum computing to speed up certain ML algorithms or enable new computational paradigms. [Source](https://en.wikipedia.org/wiki/Quantum_machine_learning) ### 14.3 MLOps & Deployment - **MLOps:** Practices for deploying and maintaining ML models in production reliably and efficiently. [Source](https://en.wikipedia.org/wiki/MLOps) - **Model Serving:** Infrastructure for running trained models and handling inference requests at scale. [Source](https://en.wikipedia.org/wiki/MLOps) - **Model Monitoring:** Tracking model performance in production to detect degradation, data drift, or concept drift. [Source](https://en.wikipedia.org/wiki/MLOps) - **CI/CD for ML:** Continuous integration and deployment pipelines adapted for ML model development and testing. [Source](https://en.wikipedia.org/wiki/CI/CD) - **Feature Stores:** Centralized repositories for storing, managing, and serving ML features across teams and models. [Source](https://en.wikipedia.org/wiki/Feature_store) - **Model Registry:** A centralized store for tracking model versions, metadata, and lineage throughout the ML lifecycle. [Source](https://en.wikipedia.org/wiki/MLOps) - **A/B Testing for ML:** Comparing model variants in production using randomized experiments to measure real-world impact. [Source](https://en.wikipedia.org/wiki/A/B_testing) - **Containerization (Docker):** Packaging ML models with their dependencies into containers for reproducible, portable deployment. [Source](https://en.wikipedia.org/wiki/Docker_\(software\)) - **ONNX (Open Neural Network Exchange):** An open standard for representing ML models, enabling interoperability between different frameworks. [Source](https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange) ### 14.4 Frameworks & Libraries - **PyTorch:** An open-source deep learning framework known for its dynamic computation graph and research-friendly design. [Source](https://en.wikipedia.org/wiki/PyTorch) - **TensorFlow:** Google's open-source ML framework with a comprehensive ecosystem for research and production. [Source](https://en.wikipedia.org/wiki/TensorFlow) - **JAX:** Google's library for high-performance numerical computing with automatic differentiation and GPU/TPU acceleration. [Source](https://en.wikipedia.org/wiki/Google_JAX) - **scikit-learn:** A Python library providing simple and efficient tools for classical machine learning algorithms. [Source](https://en.wikipedia.org/wiki/Scikit-learn) - **Hugging Face Transformers:** An open-source library providing pre-trained transformer models and tools for NLP and beyond. [Source](https://en.wikipedia.org/wiki/Hugging_Face) - **Keras:** A high-level neural network API running on top of TensorFlow, designed for fast experimentation. [Source](https://en.wikipedia.org/wiki/Keras) - **LangChain:** A framework for building applications powered by language models, including chains, agents, and retrieval. [Source](https://en.wikipedia.org/wiki/LangChain) - **Weights & Biases (W&B):** A platform for experiment tracking, model visualization, and collaboration in ML projects. [Source](https://wandb.ai/) - **MLflow:** An open-source platform for managing the end-to-end machine learning lifecycle. [Source](https://en.wikipedia.org/wiki/MLflow) - **Ray:** A distributed computing framework for scaling ML workloads including training, tuning, and serving. [Source](https://en.wikipedia.org/wiki/Ray_\(software\)) - **DeepSpeed:** Microsoft's library for efficient distributed training and inference of large models. [Source](https://github.com/microsoft/DeepSpeed) - **NVIDIA Triton Inference Server:** A platform for deploying and serving ML models at scale across various frameworks. [Source](https://developer.nvidia.com/triton-inference-server) --- ## 15. Emerging Frontiers & Advanced Topics ### 15.1 Artificial General Intelligence (AGI) - **AGI:** A hypothetical AI system with human-level cognitive abilities across all intellectual domains. [Source](https://en.wikipedia.org/wiki/Artificial_general_intelligence) - **Paths to AGI:** Different proposed approaches including scaling current methods, hybrid architectures, whole brain emulation, and novel paradigms. [Source](https://en.wikipedia.org/wiki/Artificial_general_intelligence) - **Superintelligence:** A hypothetical intellect far surpassing the best human minds in every domain. [Source](https://en.wikipedia.org/wiki/Superintelligence) - **Intelligence Explosion:** The hypothesis that an AI improving its own intelligence could trigger a rapid, runaway cycle of self-improvement. [Source](https://en.wikipedia.org/wiki/Technological_singularity) - **Whole Brain Emulation:** The hypothetical process of scanning and simulating a biological brain at sufficient detail to reproduce its function. [Source](https://en.wikipedia.org/wiki/Mind_uploading) ### 15.2 Test-Time Compute & Reasoning - **Test-Time Compute Scaling:** Allocating more computation during inference (e.g., longer reasoning chains) to improve answer quality. [Source](https://arxiv.org/abs/2408.03314) - **Chain-of-Thought at Scale:** Extended reasoning traces during inference that allow models to solve harder problems. [Source](https://arxiv.org/abs/2201.11903) - **Process Reward Models:** Training reward models that evaluate each step of a reasoning chain, not just the final answer. [Source](https://arxiv.org/abs/2305.20050) - **Outcome Reward Models:** Reward models that evaluate only the final answer, providing sparser but more objective feedback. [Source](https://arxiv.org/abs/2305.20050) - **Verification and Self-Correction:** Methods for LLMs to check and correct their own outputs during generation. [Source](https://arxiv.org/abs/2303.17651) ### 15.3 Continual & Lifelong Learning - **Continual Learning:** Training models to learn new tasks sequentially without forgetting previously learned ones. [Source](https://en.wikipedia.org/wiki/Continual_learning) - **Elastic Weight Consolidation (EWC):** Prevents catastrophic forgetting by penalizing changes to weights important for previous tasks. [Source](https://arxiv.org/abs/1612.00796) - **Progressive Neural Networks:** Add new network columns for each new task while freezing parameters of previous columns. [Source](https://arxiv.org/abs/1606.04671) - **Replay-Based Methods:** Store and rehearse examples from previous tasks to prevent forgetting. [Source](https://en.wikipedia.org/wiki/Continual_learning) ### 15.4 Causal Inference & AI - **Causal Inference:** Moving beyond correlations to understand cause-and-effect relationships from data. [Source](https://en.wikipedia.org/wiki/Causal_inference) - **Structural Causal Models (SCMs):** Judea Pearl's framework for representing causal relationships using directed acyclic graphs and structural equations. [Source](https://en.wikipedia.org/wiki/Causal_model) - **Do-Calculus:** A set of rules for computing interventional probabilities from observational data given a causal graph. [Source](https://en.wikipedia.org/wiki/Do-calculus) - **Causal Discovery:** Algorithms that infer causal structure from observational data. [Source](https://en.wikipedia.org/wiki/Causal_inference) - **Counterfactual Reasoning:** Reasoning about what would have happened under different conditions, enabled by causal models. [Source](https://en.wikipedia.org/wiki/Counterfactual_thinking) - **Treatment Effect Estimation:** Using ML to estimate causal effects of interventions from observational data. [Source](https://en.wikipedia.org/wiki/Average_treatment_effect) ### 15.5 AI for Code - **Code LLMs:** Large language models specifically trained or fine-tuned for code generation and understanding (Codex, StarCoder, CodeLlama). [Source](https://en.wikipedia.org/wiki/GitHub_Copilot) - **GitHub Copilot:** An AI pair programmer powered by OpenAI's Codex that suggests code completions in the editor. [Source](https://en.wikipedia.org/wiki/GitHub_Copilot) - **Code Search & Understanding:** Using AI to search, navigate, and understand large codebases semantically. [Source](https://en.wikipedia.org/wiki/Code_search) - **Automated Bug Detection:** Using ML to identify bugs, vulnerabilities, and code quality issues. [Source](https://en.wikipedia.org/wiki/Static_program_analysis) - **Program Synthesis:** Automatically generating programs from specifications, examples, or natural language descriptions. [Source](https://en.wikipedia.org/wiki/Program_synthesis) - **Formal Methods & AI:** Combining AI with formal verification to prove correctness of software and hardware. [Source](https://en.wikipedia.org/wiki/Formal_methods) ### 15.6 Geometric Deep Learning - **Geometric Deep Learning:** A unifying framework extending deep learning to non-Euclidean domains (graphs, manifolds, groups). [Source](https://en.wikipedia.org/wiki/Geometric_deep_learning) - **Equivariant Neural Networks:** Networks whose outputs transform predictably under symmetry transformations of the input (rotation, translation). [Source](https://en.wikipedia.org/wiki/Equivariant_neural_network) - **E(3)-Equivariant Networks:** Neural networks equivariant to 3D rotations, translations, and reflections, important for molecular and physical systems. [Source](https://arxiv.org/abs/2102.09844) - **Hyperbolic Neural Networks:** Networks operating in hyperbolic space, naturally suited for representing hierarchical data. [Source](https://arxiv.org/abs/1805.09112) - **Mesh Neural Networks:** Processing 3D mesh data (vertices, edges, faces) with specialized neural network architectures. [Source](https://arxiv.org/abs/1809.05910) ### 15.7 Embodied Intelligence & Simulation - **Embodied AI:** The approach that intelligence requires a body situated in an environment, not just disembodied computation. [Source](https://en.wikipedia.org/wiki/Embodied_cognition) - **Simulation Environments (Gym, Habitat, Isaac):** Standardized platforms for training and evaluating AI agents in simulated worlds. [Source](https://en.wikipedia.org/wiki/OpenAI_Gym) - **Digital Twins:** Virtual replicas of physical systems used for simulation, monitoring, and optimization with AI. [Source](https://en.wikipedia.org/wiki/Digital_twin) - **World Simulators:** AI models that predict future states of complex environments, potentially useful for planning and training. [Source](https://en.wikipedia.org/wiki/Simulation) ### 15.8 Compression & Efficiency Research - **Neural Network Compression:** Reducing model size while maintaining performance through pruning, quantization, distillation, and architecture design. [Source](https://en.wikipedia.org/wiki/Pruning_\(artificial_neural_network\)) - **Lottery Ticket Hypothesis:** The conjecture that dense networks contain sparse subnetworks that can train to full accuracy from initialization. [Source](https://arxiv.org/abs/1803.03635) - **Structured Pruning:** Removing entire channels, layers, or attention heads rather than individual weights for hardware-friendly compression. [Source](https://en.wikipedia.org/wiki/Pruning_\(artificial_neural_network\)) - **Neural Architecture Efficiency:** Designing architectures that achieve high accuracy with minimal parameters and FLOPs (e.g., EfficientNet, MobileNet). [Source](https://arxiv.org/abs/1905.11946) - **Once-for-All Networks:** Training a single network that supports many different sub-networks for deployment at various resource levels. [Source](https://arxiv.org/abs/1908.09791) ### 15.9 Tokenization & Representation Frontiers - **Multimodal Tokenization:** Developing unified tokenization schemes that handle text, images, audio, and video in a single vocabulary. [Source](https://arxiv.org/abs/2301.12597) - **Byte-Level Models:** Models that operate directly on raw bytes, eliminating the need for tokenization entirely. [Source](https://arxiv.org/abs/2305.07185) - **Sparse Representations:** Representing data with only a few active elements, enabling efficient storage and computation. [Source](https://en.wikipedia.org/wiki/Sparse_matrix) - **Disentangled Representations:** Learning latent variables that each capture a single generative factor, enabling controllable generation. [Source](https://en.wikipedia.org/wiki/Disentangled_representation_learning) --- _This map represents the landscape of artificial intelligence as of 2025. The field continues to evolve rapidly, with new subfields, techniques, and applications emerging regularly._ More: [[AI-written artificial intelligence]]