o1 - Burny

[OpenAI O1 AI: Examples of step change in math, physics, programming! Architecture benchmarks, future - YouTube](https://www.youtube.com/watch?v=upVNFGt1ZrQ) New OpenAI O1 AI model: Examples of step change in math, physics, programming! New benchmarks, technical details, current state and future of AI! Concrete examples of step change in math, physics, programming: 1. Terrence Tao (top mathematician) says O1 can for the first time identify suitable theorems (Cramer's theorem) for vague mathematical queries, and perform at the level of a mediocre but not incompetent graduate student. 2. George Hotz called it the "first model capable of programming at all." 3. A user reported O1 fixed a complex Bluetooth protocol issue that previous models failed to understand. 4. An astrophysics PhD student reported that O1 accomplished in minutes what took a year of their PhD work, producing smaller code. 5. O1 came up with a novel approach for a proof of a theorem, which impressed a mathematician. 6. O1 correctly was able to compute the fundamental group of a circle in one attempt, while other models struggled. 7. A law professional reported being happy with O1's performance in legal tasks. 8. A lot of other mathematicians are reporting step change in performance. Concrete benchmarks: 1. OpenAI benchmarks show O1 crushing benchmarks in coding, math, and PhD-level questions. 2. In LiveBench O1 Preview outperforms other models, including Claude, in reasoning, mathematics, data analysis, and language tasks according to the LiveBench benchmark. Interestingly, O1 Mini performs better than O1 Preview in some reasoning tasks. In coding tasks, the results are mixed, with different models excelling in different aspects (e.g., code generation vs. completion). 3. OlympicArena reasoning benchmark: O1 significantly outperforms previous models. 4. O1 crushes internal Devin software engineering benchmarks. 5. Simple Bench: O1 preview showed a 50% improvement, though under different testing conditions. 6. Aiden bench: Big jump for O1 7. Mensa IQ tests: Jump from 90 IQ to 120 IQ. 8. Arc benchmark: O1 preview performs much better than GPT-4o, but at the same level as Claude 3.5 Sonnet. 9. Some medical benchmarks got crushed. 10. Some privately held benchmarks show a 30% increase in performance. Concrete technical details: 1. O1 is trained with reinforcement learning, chain of thought, selfcorrection, breaking down of problems, search and sampling with a scoring function. 2. New scaling laws for test time compute. 3. o1 is single model, there is no multiple models, “monologue” etc 4. o1 is able to generate very long responses while keeping coherence unlike all previous models 5. No amount of prompt engineering on GPT-4o can match o1’s performance 6. o1-mini dataset had much more STEM data than other data Current state and future of AI: 1. There's a mixed reception to O1, with some praising it as a significant step change and saying it's just minor incremental step and everyone is pointing out its limitations. 2. Confusion exists about O1's coding abilities, with conflicting reports on its performance compared to other models. 3. O1 still struggles with some out-of-distribution reasoning tasks and simple logic problems. 4. The full O1 model that OpenAI has internally (not just the preview) is expected to significantly outperform the current preview version on most of these benchmarks and tasks. 5. Upcoming AI releases (o1 (full), Orion/GPT-5, Claude 3.5 Opus, Gemini 2 (maybe with AlphaProof and AlphaCode integrated), Grok 3, possibly Llama 4, etc.) are anticipated to push capabilities further. 6. More AGI labs are already integrating more reinforcement learning, chain of thought, selfcorrection, breaking down of problems, search and sampling with a scoring function, planning, agents and so on. 7. We should also accelerate reverse engineering research such as mechanistic interpretability for proper steerability in context where it's the most important. 8. We may be on the brink of an intelligence explosion, with models already showing signs of recursive self-improvement, such as o1 contributing to frontier AGI research and development inside OpenAI. 9. Competition between major AI labs (OpenAI, Google, Anthropic, xAI, SSI) is intensifying, with each having potential advantages in compute, data, or algorithms. 10. May AI advancements benefit everyone and lead to collective superflourishing, superintelligence, superwellbeing, superlongevity, superunderstanding! Cycle repeats [[Images/a2c228f2ddad3da557ce5eddc12bb230_MD5.jpeg|Open: Pasted image 20240917184323.png]] ![[Images/a2c228f2ddad3da557ce5eddc12bb230_MD5.jpeg]] New OpenAI o1 model "We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math." [Introducing OpenAI o1 | OpenAI](https://openai.com/index/introducing-openai-o1-preview/) [Learning to Reason with LLMs | OpenAI](https://openai.com/index/learning-to-reason-with-llms/) [o1 System Card | OpenAI](https://openai.com/index/openai-o1-system-card/) [OpenAI Strawberry Livestream - Metaprompting, Cognitive Architecture, Multi-Agent, Finetuning - YouTube](https://www.youtube.com/live/AO7mXa8BUWk) [ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) - YouTube](https://www.youtube.com/watch?v=7J44j6Fw8NM) [o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know - YouTube](https://www.youtube.com/watch?v=KKF7kL0pGc4) [GPT-o1 - by Zvi Mowshowitz - Don't Worry About the Vase](https://thezvi.substack.com/p/gpt-4o1) [Reddit - Dive into anything](https://www.reddit.com/r/singularity/comments/1ff7mod/openai_announces_o1/) trained with reinforcement learning, chain of thought, correction, sampling with scoring function, etc. https://openai.com/index/openai-o1-system-card/ my thoughts [OpenAI o1 Strawberry Q* AI reasoning LLM model destroys Claude 3.5 Sonnet on reasoning, mathematics! - YouTube](https://www.youtube.com/watch?v=MBxcKY6he1c) https://x.com/burny_tech/status/1834650814419550368 https://x.com/burny_tech/status/1834651772637384712 " OpenAI o1 Strawberry Q* AI reasoning LLM model destroys Anthropic Claude 3.5 Sonnet, Google, and Meta on reasoning, mathematics, data analysis, and language! Deep dive into benchmarks and future of AI models! 1. OpenAI released new models called O1 Preview and O1 Mini, along with benchmarks showing their performance. 2. The full O1 model appears to be significantly stronger than O1 Preview, particularly in areas like competition math, coding and PhD level questions. 3. O1 Preview outperforms other models, including Claude, in reasoning, mathematics, data analysis, and language tasks according to the LiveBench benchmark. 4. Interestingly, O1 Mini performs better than O1 Preview in some reasoning tasks. 5. In coding tasks, the results are mixed, with different models excelling in different aspects (e.g., code generation vs. completion). 6. The new models seem to be specialized for certain tasks, potentially sacrificing performance in others (e.g., instruction following). 7. The models appear to use reasoning techniques like chain-of-thought reasoning and sampling with scoring function, possibly integrated into the model architecture. 8. Other major AI labs (Google, Meta, Anthropic) are likely working on similar approaches, and new model releases are expected soon. 9. There's new paradigm with a scaling law, test-time compute. 10. I'm concerned about AI labs potentially not releasing their best models for everyone, citing risks of power concentration, but I understand the other risks too. 11. Relatively big progress in AI capabilities, with comparisons to models from just a few years ago. 12. There are some conflicting benchmark results, we will need to see how these models perform in practice in the long term. " benchmarks https://x.com/burny_tech/status/1834283752346005926 dominating basically every bench mark like livebench [LiveBench](https://livebench.ai/) https://x.com/polynoamial/status/1834280155730043108 iq https://x.com/DaveShapi/status/1835117569432224005 https://x.com/maximlott/status/1835043371339202639?t=tucltiRS3VVMw6r3cdeTnA&s=19 [ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) - YouTube](https://www.youtube.com/watch?v=7J44j6Fw8NM) AI explained has great benchmarks out of distribution https://x.com/kimmonismus/status/1834296216009552341 https://x.com/aidan_mclau/status/1835023308238340460 Eidenbench https://x.com/burny_tech/status/1835091020276437138 OlympicArena reasoning benchmark for o1-preview goes hard https://x.com/sytelus/status/1834352532585676859 [x.com/alexandr\_wang/status/1838637233169211838](https://x.com/alexandr_wang/status/1838637233169211838) seal https://x.com/polynoamial/status/1835086680266883205 The AI field desperately needs harder evals that take into consideration continued fast progress. https://x.com/burny_tech/status/1834716200586084485 dominating basically every bench mark like livebench [LiveBench](https://livebench.ai/) but Arc has not fallen yet, I wonder how much would the AlphaZero-like RL with selfcorrecting CoT finetuning of o1 on ARC score on ARC And I wonder how valid is this internal Devin benchmark for coding they have, the exponential https://fxtwitter.com/cognition_labs/status/1834292718174077014 planning got jump [X](https://x.com/polynoamial/status/1838251987009183775) The new OpenAI o1 (just preview for now) model still struggling to reason out of distribution (ARC benchmark, SimpleBench benchmark, still problems with some famous puzzles, etc.) makes me think that we will get much better AI models once we figure out much more robust (hardcoded or emergent) first principles reasoning (in hierarchies and graphs and so on), instead of retrieving and synthesizing sometimes brittle weakly generalizing reasoning program chunks from the training data stored in the late space. Maybe scale, better training data and training hacks will cause the emergence general enough, robust enough, all-encompassing enough reasoning engine that will eventually phase shift into first principles reasoning metastable configuration of weights. Public narrative about AI is shifting now https://x.com/burny_tech/status/1835091400985096337 "o1 makes it abundantly clear that only OpenAI's internal models will be truly useful for civilizational-level change. Their internal capabilities now far exceed their publicly shipped products and that gap will continue to grow." https://x.com/SmokeAwayyy/status/1835012208587423788 if they released the full o1 model... maybe in a month tho? https://fxtwitter.com/main_horse/status/1834333269128872365 [[AINews] o1: OpenAI's new general reasoning models • Buttondown](https://buttondown.com/ainews/archive/ainews-o1-openais-new-general-reasoning-models/) authors https://x.com/markchen90/status/1834343908035178594 crushing mathematics and informatics olympiad https://x.com/burny_tech/status/1834327275946361099 https://x.com/burny_tech/status/1834321466105770184 OpenAI observed interesting instances of reward hacking in their new model 🤔 https://x.com/burny_tech/status/1834324288402243655 Well, there goes the “AI agent unexpectedly and successfully exploits a configuration bug in its training environment as the path of least resistance during cyberattack capability evaluations” milestone. https://x.com/davidad/status/1834454815092449299 (Even tho I'm very excited about the new AI model, I think certain risks are very real and that we should also accelerate reverse engineering and mechanistic interpretability of these AI systems research such that we can steer them properly in contexts where we need to steer them!) https://x.com/tensor_fusion/status/1834561918712856603 https://x.com/ShakeelHashim/status/1834292287087485425 How far can inference time compute go? new scaling laws, Bitter lesson makes a comeback https://x.com/burny_tech/status/1834289214776565844 https://x.com/DaveShapi/status/1835117776920334703 In another 6 months we will possibly have o1 (full), Orion/GPT-5, Claude 3.5 Opus, Gemini 2 (maybe with Alphaproof and Alphacode integrated), Grok 3, possibly Llama 4 [Reddit - Dive into anything](https://www.reddit.com/r/singularity/comments/1fgnfdu/in_another_6_months_we_will_possibly_have_o1_full/) This is gonna the hottest winter on record. https://x.com/DaveShapi/status/1834986252359049397 https://x.com/slow_developer/status/1834958266998157547 A couple of PRs to the OpenAI codebase was already authored solely by o1! https://x.com/lukasz_kondr/status/1834643103397167326 Technically initial recursive self-improvement from the new OpenAI o1 model. It made nontrivial contributions to frontier AI research and development. https://x.com/burny_tech/status/1834735949101600770 https://x.com/huybery/status/1834291444540194966 Step change in coding, math, physics! My feed is full of people praising o1 for being much better in math than previous models! I didn't believe LLMs can get so much better in math! I was wrong, once again! Do not underestimate the bitter God of Scale and AlphaZero-like RL! And we have not reached the peak of inference time compute scaling laws! Future will be interesting! Looking towards to more progress in AI x math! https://x.com/burny_tech/status/1834748815913398462 THERE IS NO PLATEAUING! WE'RE JUST GETTING STARTED WITH O1, ALPHAPROOF AND SIMILAR NEW AI SYSTEMS! [AI achieves silver-medal standard solving International Mathematical Olympiad problems - Google DeepMind](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/) My feed full of: OpenAI o1 model is step change in mathematics OpenAI o1 model is step change in programming OpenAI o1 model is step change in physics etc. Big https://x.com/burny_tech/status/1834957917033730188 https://x.com/IntuitMachine/status/1835240555028136092 https://x.com/teortaxesTex/status/1834725127029932081 [Terence Tao: "I have played a little bit with OpenAI's new iter…" - Mathstodon](https://mathstodon.xyz/@tao/113132502735585408) terrence tao step change from completely incompetent graduate student to incompetent graduate student, first time identifying and using Cramer's theorem https://x.com/omarsar0/status/1834315401812910195 simple math https://x.com/burny_tech/status/1834350997256187971 [Learning to Reason with LLMs | Hacker News](https://news.ycombinator.com/item?id=41523070) fixing bluetooth protocol https://x.com/anderssandberg/status/1834536105527398717 math https://x.com/robertghrist/status/1834564488751731158 New mathematical proof with o1?! A thread of a mathematician researcher sharing his team's findings on how o1 model helped them write a new mathematics paper with proving a new theorem! [x.com/robertghrist/status/1841462507543949581?t=5zV3VpQI0mbrSU9\_QRtfkQ&s=19](https://x.com/robertghrist/status/1841462507543949581?t=5zV3VpQI0mbrSU9_QRtfkQ&s=19) https://x.com/QiaochuYuan/status/1834341057099948170 first LLM i've tested that can compute the fundamental group of the circle https://x.com/emollick/status/1835342797722767592 [ChatGPT o1 preview + mini Wrote My PhD Code in 1 Hour*‚ÄîWhat Took Me ~1 Year - YouTube](https://youtu.be/M9YOO7N5jF8) astrophysics https://x.com/realGeorgeHotz/status/1835228364837470398 programming, it's "a mediocre, but not completely incompetent, software engineer" https://x.com/scottastevenson/status/1834408343395258700 law https://x.com/DeryaTR_/status/1834630356286558336 medical stuff [x.com/aj\_dev\_smith/status/1835521394659983477](https://x.com/aj_dev_smith/status/1835521394659983477) music https://x.com/holdenmatt/status/1835031749706785258 happy mathematician [x.com/DeryaTR\_/status/1836434726774526381](https://x.com/DeryaTR_/status/1836434726774526381) happy biochemist "o1 model is comparable to an outstanding PhD student in biomedical sciences" [I used o1-mini everyday for coding against Claude Sonnet 3.5 so you don't have to - my thoughts : r/ClaudeAI](https://www.reddit.com/r/ClaudeAI/comments/1fhjgcr/i_used_o1mini_everyday_for_coding_against_claude/) coding https://x.com/AravSrinivas/status/1834786331194802407 prompts where you feel o1-preview outperformed sonnet-3.5 that’s not a puzzle or a coding competition problem but your daily usage prompts. 🧵 implementation details of o1 [Learning to Reason with LLMs | OpenAI](https://openai.com/index/learning-to-reason-with-llms/) trained with reinforcement learning, chain of thought, correction, sampling with scoring function, etc. inference time compute, maybe hardwired into architecture, maybe looping of tokens through the layers multi times, search, graph of thought, selfcorrection, etc. https://x.com/DrJimFan/status/1834279865933332752 https://x.com/polynoamial/status/1834280155730043108 [GitHub - hijkzzz/Awesome-LLM-Strawberry: A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.](https://github.com/hijkzzz/Awesome-LLM-Strawberry) Democratization of o1 is happening Chain-of-thought, self-critique, verification, multi-step reasoning, decomposting tasks, Monte Carlo Tree Search, reinforcement learning, self-play, preference learning, process supervision, compute-optimal sampling, mutual reasoning, deliberative planning, self-rewarding, uncertainty-aware planning, imagination-based self-improvement, latent-variable inference, value-guided decoding, procedure cloning, and various approaches to scaling and optimizing LLM inference and training, all aimed at improving reasoning, problem-solving, and overall performance across diverse tasks. Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. [[2407.21787v1] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling](https://arxiv.org/abs/2407.21787v1) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. [[2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314) https://x.com/rohanpaul_ai/status/1835443326205517910 https://x.com/terryyuezhuo/status/1834286548571095299 ReFT: Reasoning with Reinforced Fine-Tuning [[2401.08967] ReFT: Reasoning with Reinforced Fine-Tuning](https://arxiv.org/abs/2401.08967) Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning [[2402.05808] Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning](https://arxiv.org/abs/2402.05808) https://x.com/iamgingertrash/status/1834297595486675052 tree search distillation + RL post training! https://x.com/rm_rafailov/status/1834291016192360743 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking](https://arxiv.org/abs/2403.09629) https://x.com/laion_ai/status/1834564564601729421 Let's Verify Step by Step [[2305.20050] Let's Verify Step by Step](https://arxiv.org/abs/2305.20050) ['Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon) - YouTube](https://www.youtube.com/watch?v=hZTZYffRsKI) [Reddit - Dive into anything](https://www.reddit.com/r/LocalLLaMA/comments/1fgr244/reverse_engineering_o1_architecture_with_a_little/) [Reverse engineering OpenAI’s o1 - by Nathan Lambert](https://www.interconnects.ai/p/reverse-engineering-openai-o1) When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs [[2406.01297] When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs](https://arxiv.org/abs/2406.01297) https://x.com/_xjdr/status/1835352391648158189?t=m0K7Xv5AUUDwcZGKM4LcyQ&s=19 https://x.com/sytelus/status/1835433363882270922?t=Y3AXFaiCS7DCKIUiDOhY2w&s=19 o1 developer AMA summary [Aidan X](https://x.com/aidan_mclau/status/1842550225824809439) trying my favorite prompts You are the most knowledgeable polymath multidisciplinary scientist that is a perfect generalist and specializes in everything and knows how everything works. Write a gigantic article about all of science from first principles https://x.com/burny_tech/status/1834334382888218937 quantum gravity https://x.com/burny_tech/status/1834333794477973821 reimann hypothesis https://x.com/burny_tech/status/1834332769129726038 maps https://x.com/burny_tech/status/1834768077717680135 Will it live up to its hype or be the biggest collective blueball in the history of collective blueballs? It lived to its hype https://x.com/burny_tech/status/1834279980744016180 quote: “Many tasks don’t need reasoning” Absolutely cooking some jobs lmao https://x.com/burny_tech/status/1834288198731645422 "OpenAI 's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis? AI can be more than chatbots" https://x.com/polynoamial/status/1834280969786065278 How far can scaling go? https://imgur.com/3tDPyT4 jailbreaking got harder You're gonna have to step up your game @elder_plinius https://x.com/burny_tech/status/1834324442123497924 jailbroken, no model is immune to pliny https://x.com/elder_plinius/status/1834381507978280989 https://x.com/DrJimFan/status/1834284702494327197 https://x.com/burny_tech/status/1834291805690503424 https://x.com/burny_tech/status/1834311367454519515 https://x.com/jam3scampbell/status/1834285523546058973 Level 2 Reasoners are here. Next up: Level 3 Agents. https://x.com/SmokeAwayyy/status/1834327038561587279 [Something New: On OpenAI's "Strawberry" and Reasoning](https://www.oneusefulthing.org/p/something-new-on-openais-strawberry) only 4 years ago the best language model in the world was gpt-2 xl. can you imagine where we might be 4 years from now? https://x.com/willdepue/status/1834309302598971834 The advantage of OpenAI having unthrottled, internal access to o1 cannot be overstated. https://x.com/BenjaminDEKR/status/1834322459354337519 https://x.com/burny_tech/status/1834664364990673017 cursor and claude 3.5 sonnet and replit gets less trendy https://x.com/dkardonsky_/status/1834281667512746468 integration with cursor https://x.com/mckaywrigley/status/1834311328045170862 https://x.com/cursor_ai/status/1834665828308205661 the new o1 model looks amazing but luckily it has a phd level intelligence so our jobs are safe for now https://x.com/netcapgirl/status/1834290758930600069 no more patience, jimmy https://x.com/sama/status/1834276403270857021 You all know what this means: the demand for *fast inference compute* is about to explode. https://x.com/tunguz/status/1834366040437895257 Which lab/team will be the next to release a reasoning AI model? https://x.com/tunguz/status/1834363884326490204 scaling works, situational awareness was right, leopold aschenbrenner was right, Just look at the fucking line! https://x.com/jackgwhitaker/status/1834284617165316434 https://x.com/burny_tech/status/1835365831661740398 "get back to work", "ai is thinking!" https://x.com/yonashav/status/1834325806509949077 The goalposts shall keep moving until the Kardashev scale improves https://x.com/BasedBeffJezos/status/1834292166924943457 stochastic parrots can fly so high https://x.com/8teAPi/status/1834321503992869177 future models will think for weeks. don’t die. https://x.com/iruletheworldmo/status/1834330060205294007 We're only beginning to understand this new paradigm of CoT-LLMs. There're so many new phenomena to study, research on it will be very exciting. You know it's a start of something good when your first model (with extra tuning) gets 93% on AIME’24 and does IOI-level coding :). https://x.com/lukaszkaiser/status/1834283634888724563 How many startups were wrecked today? https://x.com/tunguz/status/1834324723250970802 oh husbant, you asked gpt-o1-preview model on api to solve the p vs np problem and it thought about it for a week. our api bill shows a usage of $10K USD and now we are homeress https://x.com/dejavucoder/status/1834316507058168091 Hope you guys have strapped your seatbelts. https://x.com/tunguz/status/1834301242656297138 The year is 2027 and OpenAI just dropped AGI, but no one noticed because it was called gpt-5.5-o3-large2-preview-2027-09-06 https://x.com/tylertracy321/status/1834286741202894985 haha gpus go bitterrr https://x.com/burny_tech/status/1834616064178602321 it's so over bros it has been 12 hours since openai announced o1 and it has so far failed to to solve - Riemann hypothesis - Quantum Gravity - FTL (Faster Than Light travel) - P=NP - Grand Unified Theory - Cure for cancer clearly this shows ai has hit a wall and openai is about to go bankrupt https://x.com/basedjensen/status/1834462070395601094 new paradigm https://x.com/willdepue/status/1834294935497179633 his is what ilya saw, path to AGI https://x.com/WilliamBryk/status/1834614138955526440 It may be that today's large neural networks have enough test time compute to be slightly conscious https://x.com/markchen90/status/1834623248610521523 Intelligence is thermodynamics. https://x.com/BasedBeffJezos/status/1834486894836470199 Everything is thermodynamics. https://x.com/burny_tech/status/1835424059116675470 The holy reasoning war of nerds https://x.com/burny_tech/status/1834721690271858729 may the God of Scaling be on our side https://x.com/burny_tech/status/1834726584927813827 openai research engineer interviews codign questiosn destroyed by o1 https://x.com/burny_tech/status/1834738026691375381 naming is absurd https://imgur.com/4tauXCt Deep learning is hitting a wall! (But its a bit more neurosymbolic so props to you Garry!) falsified gary marcus prediction of no step change this year https://x.com/GaryMarcus/status/1766871625075409381 https://x.com/mealreplacer/status/1834292016462610507 ML street talk says its not true reasoning as they define it percisely (must be turing complete, must acquire and generate new knowledge) https://x.com/MLStreetTalk/status/1834286363476476391 https://x.com/MLStreetTalk/status/1834293397936394726 [Is o1-preview reasoning? - YouTube](https://www.youtube.com/watch?v=nO6sDk6vO0g) machine qualia will soon become relevant https://imgur.com/3VVGf2H adding web, math engines and other symblic engines, or for physics, would be even more powerful, perplexity on steroids, AlphaProof on steroids [AI achieves silver-medal standard solving International Mathematical Olympiad problems - Google DeepMind](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/) So what is the road to AGI now? 1) Learn all human knowledge and reasoning patterns in distribution, overfit the whole world? With more modalities, which would already be AGI and more, because no single human possesses the sum of all human knowledge and reasoning patterns that you can retrieve from. 2) For more superhuman reasoning performance, more similar RL methods to AlphaZero that require little or zero human input via self-play, which the new OpenAI o1 model partly used by its reward network. 3) Implement more graph of thought iterative reasoning in both training and test-time compute. 4) Synthetic data. Automatic labeling. Massive parallel training in simulations like Nvidia. 5) More scaling. 6) More neurosymbolic approaches like AlphaProof. ,... it will be compute *and* algorithms *and* data