[OpenAI O1 AI: Examples of step change in math, physics, programming! Architecture benchmarks, future - YouTube](https://www.youtube.com/watch?v=upVNFGt1ZrQ)
New OpenAI O1 AI model: Examples of step change in math, physics, programming! New benchmarks, technical details, current state and future of AI!
Concrete examples of step change in math, physics, programming:
1. Terrence Tao (top mathematician) says O1 can for the first time identify suitable theorems (Cramer's theorem) for vague mathematical queries, and perform at the level of a mediocre but not incompetent graduate student.
2. George Hotz called it the "first model capable of programming at all."
3. A user reported O1 fixed a complex Bluetooth protocol issue that previous models failed to understand.
4. An astrophysics PhD student reported that O1 accomplished in minutes what took a year of their PhD work, producing smaller code.
5. O1 came up with a novel approach for a proof of a theorem, which impressed a mathematician.
6. O1 correctly was able to compute the fundamental group of a circle in one attempt, while other models struggled.
7. A law professional reported being happy with O1's performance in legal tasks.
8. A lot of other mathematicians are reporting step change in performance.
Concrete benchmarks:
1. OpenAI benchmarks show O1 crushing benchmarks in coding, math, and PhD-level questions.
2. In LiveBench O1 Preview outperforms other models, including Claude, in reasoning, mathematics, data analysis, and language tasks according to the LiveBench benchmark. Interestingly, O1 Mini performs better than O1 Preview in some reasoning tasks. In coding tasks, the results are mixed, with different models excelling in different aspects (e.g., code generation vs. completion).
3. OlympicArena reasoning benchmark: O1 significantly outperforms previous models.
4. O1 crushes internal Devin software engineering benchmarks.
5. Simple Bench: O1 preview showed a 50% improvement, though under different testing conditions.
6. Aiden bench: Big jump for O1
7. Mensa IQ tests: Jump from 90 IQ to 120 IQ.
8. Arc benchmark: O1 preview performs much better than GPT-4o, but at the same level as Claude 3.5 Sonnet.
9. Some medical benchmarks got crushed.
10. Some privately held benchmarks show a 30% increase in performance.
Concrete technical details:
1. O1 is trained with reinforcement learning, chain of thought, selfcorrection, breaking down of problems, search and sampling with a scoring function.
2. New scaling laws for test time compute.
3. o1 is single model, there is no multiple models, “monologue” etc
4. o1 is able to generate very long responses while keeping coherence unlike all previous models
5. No amount of prompt engineering on GPT-4o can match o1’s performance
6. o1-mini dataset had much more STEM data than other data
Current state and future of AI:
1. There's a mixed reception to O1, with some praising it as a significant step change and saying it's just minor incremental step and everyone is pointing out its limitations.
2. Confusion exists about O1's coding abilities, with conflicting reports on its performance compared to other models.
3. O1 still struggles with some out-of-distribution reasoning tasks and simple logic problems.
4. The full O1 model that OpenAI has internally (not just the preview) is expected to significantly outperform the current preview version on most of these benchmarks and tasks.
5. Upcoming AI releases (o1 (full), Orion/GPT-5, Claude 3.5 Opus, Gemini 2 (maybe with AlphaProof and AlphaCode integrated), Grok 3, possibly Llama 4, etc.) are anticipated to push capabilities further.
6. More AGI labs are already integrating more reinforcement learning, chain of thought, selfcorrection, breaking down of problems, search and sampling with a scoring function, planning, agents and so on.
7. We should also accelerate reverse engineering research such as mechanistic interpretability for proper steerability in context where it's the most important.
8. We may be on the brink of an intelligence explosion, with models already showing signs of recursive self-improvement, such as o1 contributing to frontier AGI research and development inside OpenAI.
9. Competition between major AI labs (OpenAI, Google, Anthropic, xAI, SSI) is intensifying, with each having potential advantages in compute, data, or algorithms.
10. May AI advancements benefit everyone and lead to collective superflourishing, superintelligence, superwellbeing, superlongevity, superunderstanding!
Cycle repeats
[[Images/a2c228f2ddad3da557ce5eddc12bb230_MD5.jpeg|Open: Pasted image 20240917184323.png]]
![[Images/a2c228f2ddad3da557ce5eddc12bb230_MD5.jpeg]]
New OpenAI o1 model
"We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math."
[Introducing OpenAI o1 | OpenAI](https://openai.com/index/introducing-openai-o1-preview/)
[Learning to Reason with LLMs | OpenAI](https://openai.com/index/learning-to-reason-with-llms/)
[o1 System Card | OpenAI](https://openai.com/index/openai-o1-system-card/)
[OpenAI Strawberry Livestream - Metaprompting, Cognitive Architecture, Multi-Agent, Finetuning - YouTube](https://www.youtube.com/live/AO7mXa8BUWk)
[ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) - YouTube](https://www.youtube.com/watch?v=7J44j6Fw8NM)
[o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know - YouTube](https://www.youtube.com/watch?v=KKF7kL0pGc4)
[GPT-o1 - by Zvi Mowshowitz - Don't Worry About the Vase](https://thezvi.substack.com/p/gpt-4o1)
[Reddit - Dive into anything](https://www.reddit.com/r/singularity/comments/1ff7mod/openai_announces_o1/)
trained with reinforcement learning, chain of thought, correction, sampling with scoring function, etc.
https://openai.com/index/openai-o1-system-card/
my thoughts [OpenAI o1 Strawberry Q* AI reasoning LLM model destroys Claude 3.5 Sonnet on reasoning, mathematics! - YouTube](https://www.youtube.com/watch?v=MBxcKY6he1c) https://x.com/burny_tech/status/1834650814419550368 https://x.com/burny_tech/status/1834651772637384712
"
OpenAI o1 Strawberry Q* AI reasoning LLM model destroys Anthropic Claude 3.5 Sonnet, Google, and Meta on reasoning, mathematics, data analysis, and language! Deep dive into benchmarks and future of AI models!
1. OpenAI released new models called O1 Preview and O1 Mini, along with benchmarks showing their performance.
2. The full O1 model appears to be significantly stronger than O1 Preview, particularly in areas like competition math, coding and PhD level questions.
3. O1 Preview outperforms other models, including Claude, in reasoning, mathematics, data analysis, and language tasks according to the LiveBench benchmark.
4. Interestingly, O1 Mini performs better than O1 Preview in some reasoning tasks.
5. In coding tasks, the results are mixed, with different models excelling in different aspects (e.g., code generation vs. completion).
6. The new models seem to be specialized for certain tasks, potentially sacrificing performance in others (e.g., instruction following).
7. The models appear to use reasoning techniques like chain-of-thought reasoning and sampling with scoring function, possibly integrated into the model architecture.
8. Other major AI labs (Google, Meta, Anthropic) are likely working on similar approaches, and new model releases are expected soon.
9. There's new paradigm with a scaling law, test-time compute.
10. I'm concerned about AI labs potentially not releasing their best models for everyone, citing risks of power concentration, but I understand the other risks too.
11. Relatively big progress in AI capabilities, with comparisons to models from just a few years ago.
12. There are some conflicting benchmark results, we will need to see how these models perform in practice in the long term.
"
benchmarks
https://x.com/burny_tech/status/1834283752346005926
dominating basically every bench mark like livebench [LiveBench](https://livebench.ai/) https://x.com/polynoamial/status/1834280155730043108
iq https://x.com/DaveShapi/status/1835117569432224005 https://x.com/maximlott/status/1835043371339202639?t=tucltiRS3VVMw6r3cdeTnA&s=19
[ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview) - YouTube](https://www.youtube.com/watch?v=7J44j6Fw8NM) AI explained has great benchmarks out of distribution
https://x.com/kimmonismus/status/1834296216009552341
https://x.com/aidan_mclau/status/1835023308238340460 Eidenbench
https://x.com/burny_tech/status/1835091020276437138 OlympicArena reasoning benchmark for o1-preview goes hard
https://x.com/sytelus/status/1834352532585676859
[x.com/alexandr\_wang/status/1838637233169211838](https://x.com/alexandr_wang/status/1838637233169211838) seal
https://x.com/polynoamial/status/1835086680266883205 The AI field desperately needs harder evals that take into consideration continued fast progress.
https://x.com/burny_tech/status/1834716200586084485 dominating basically every bench mark like livebench [LiveBench](https://livebench.ai/) but Arc has not fallen yet, I wonder how much would the AlphaZero-like RL with selfcorrecting CoT finetuning of o1 on ARC score on ARC
And I wonder how valid is this internal Devin benchmark for coding they have, the exponential https://fxtwitter.com/cognition_labs/status/1834292718174077014
planning got jump [X](https://x.com/polynoamial/status/1838251987009183775)
The new OpenAI o1 (just preview for now) model still struggling to reason out of distribution (ARC benchmark, SimpleBench benchmark, still problems with some famous puzzles, etc.) makes me think that we will get much better AI models once we figure out much more robust (hardcoded or emergent) first principles reasoning (in hierarchies and graphs and so on), instead of retrieving and synthesizing sometimes brittle weakly generalizing reasoning program chunks from the training data stored in the late space.
Maybe scale, better training data and training hacks will cause the emergence general enough, robust enough, all-encompassing enough reasoning engine that will eventually phase shift into first principles reasoning metastable configuration of weights.
Public narrative about AI is shifting now https://x.com/burny_tech/status/1835091400985096337
"o1 makes it abundantly clear that only OpenAI's internal models will be truly useful for civilizational-level change.
Their internal capabilities now far exceed their publicly shipped products and that gap will continue to grow."
https://x.com/SmokeAwayyy/status/1835012208587423788
if they released the full o1 model... maybe in a month tho? https://fxtwitter.com/main_horse/status/1834333269128872365
[[AINews] o1: OpenAI's new general reasoning models • Buttondown](https://buttondown.com/ainews/archive/ainews-o1-openais-new-general-reasoning-models/)
authors
https://x.com/markchen90/status/1834343908035178594
crushing mathematics and informatics olympiad https://x.com/burny_tech/status/1834327275946361099 https://x.com/burny_tech/status/1834321466105770184
OpenAI observed interesting instances of reward hacking in their new model 🤔 https://x.com/burny_tech/status/1834324288402243655
Well, there goes the “AI agent unexpectedly and successfully exploits a configuration bug in its training environment as the path of least resistance during cyberattack capability evaluations” milestone.
https://x.com/davidad/status/1834454815092449299 (Even tho I'm very excited about the new AI model, I think certain risks are very real and that we should also accelerate reverse engineering and mechanistic interpretability of these AI systems research such that we can steer them properly in contexts where we need to steer them!)
https://x.com/tensor_fusion/status/1834561918712856603 https://x.com/ShakeelHashim/status/1834292287087485425
How far can inference time compute go? new scaling laws, Bitter lesson makes a comeback https://x.com/burny_tech/status/1834289214776565844 https://x.com/DaveShapi/status/1835117776920334703
In another 6 months we will possibly have o1 (full), Orion/GPT-5, Claude 3.5 Opus, Gemini 2 (maybe with Alphaproof and Alphacode integrated), Grok 3, possibly Llama 4
[Reddit - Dive into anything](https://www.reddit.com/r/singularity/comments/1fgnfdu/in_another_6_months_we_will_possibly_have_o1_full/)
This is gonna the hottest winter on record.
https://x.com/DaveShapi/status/1834986252359049397 https://x.com/slow_developer/status/1834958266998157547
A couple of PRs to the OpenAI codebase was already authored solely by o1!
https://x.com/lukasz_kondr/status/1834643103397167326
Technically initial recursive self-improvement from the new OpenAI o1 model. It made nontrivial contributions to frontier AI research and development.
https://x.com/burny_tech/status/1834735949101600770
https://x.com/huybery/status/1834291444540194966
Step change in coding, math, physics!
My feed is full of people praising o1 for being much better in math than previous models! I didn't believe LLMs can get so much better in math! I was wrong, once again! Do not underestimate the bitter God of Scale and AlphaZero-like RL! And we have not reached the peak of inference time compute scaling laws! Future will be interesting! Looking towards to more progress in AI x math!
https://x.com/burny_tech/status/1834748815913398462
THERE IS NO PLATEAUING!
WE'RE JUST GETTING STARTED WITH O1, ALPHAPROOF AND SIMILAR NEW AI SYSTEMS! [AI achieves silver-medal standard solving International Mathematical Olympiad problems - Google DeepMind](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/)
My feed full of:
OpenAI o1 model is step change in mathematics
OpenAI o1 model is step change in programming
OpenAI o1 model is step change in physics
etc.
Big
https://x.com/burny_tech/status/1834957917033730188
https://x.com/IntuitMachine/status/1835240555028136092
https://x.com/teortaxesTex/status/1834725127029932081 [Terence Tao: "I have played a little bit with OpenAI's new iter…" - Mathstodon](https://mathstodon.xyz/@tao/113132502735585408) terrence tao step change from completely incompetent graduate student to incompetent graduate student, first time identifying and using Cramer's theorem
https://x.com/omarsar0/status/1834315401812910195 simple math
https://x.com/burny_tech/status/1834350997256187971 [Learning to Reason with LLMs | Hacker News](https://news.ycombinator.com/item?id=41523070) fixing bluetooth protocol
https://x.com/anderssandberg/status/1834536105527398717 math
https://x.com/robertghrist/status/1834564488751731158 New mathematical proof with o1?! A thread of a mathematician researcher sharing his team's findings on how o1 model helped them write a new mathematics paper with proving a new theorem! [x.com/robertghrist/status/1841462507543949581?t=5zV3VpQI0mbrSU9\_QRtfkQ&s=19](https://x.com/robertghrist/status/1841462507543949581?t=5zV3VpQI0mbrSU9_QRtfkQ&s=19)
https://x.com/QiaochuYuan/status/1834341057099948170 first LLM i've tested that can compute the fundamental group of the circle
https://x.com/emollick/status/1835342797722767592 [ChatGPT o1 preview + mini Wrote My PhD Code in 1 Hour*—What Took Me ~1 Year - YouTube](https://youtu.be/M9YOO7N5jF8) astrophysics
https://x.com/realGeorgeHotz/status/1835228364837470398 programming, it's "a mediocre, but not completely incompetent, software engineer"
https://x.com/scottastevenson/status/1834408343395258700 law
https://x.com/DeryaTR_/status/1834630356286558336 medical stuff
[x.com/aj\_dev\_smith/status/1835521394659983477](https://x.com/aj_dev_smith/status/1835521394659983477) music
https://x.com/holdenmatt/status/1835031749706785258 happy mathematician
[x.com/DeryaTR\_/status/1836434726774526381](https://x.com/DeryaTR_/status/1836434726774526381) happy biochemist "o1 model is comparable to an outstanding PhD student in biomedical sciences"
[I used o1-mini everyday for coding against Claude Sonnet 3.5 so you don't have to - my thoughts : r/ClaudeAI](https://www.reddit.com/r/ClaudeAI/comments/1fhjgcr/i_used_o1mini_everyday_for_coding_against_claude/) coding
https://x.com/AravSrinivas/status/1834786331194802407 prompts where you feel o1-preview outperformed sonnet-3.5 that’s not a puzzle or a coding competition problem but your daily usage prompts. 🧵
implementation details of o1
[Learning to Reason with LLMs | OpenAI](https://openai.com/index/learning-to-reason-with-llms/)
trained with reinforcement learning, chain of thought, correction, sampling with scoring function, etc.
inference time compute, maybe hardwired into architecture, maybe looping of tokens through the layers multi times, search, graph of thought, selfcorrection, etc. https://x.com/DrJimFan/status/1834279865933332752 https://x.com/polynoamial/status/1834280155730043108
[GitHub - hijkzzz/Awesome-LLM-Strawberry: A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.](https://github.com/hijkzzz/Awesome-LLM-Strawberry) Democratization of o1 is happening
Chain-of-thought, self-critique, verification, multi-step reasoning, decomposting tasks, Monte Carlo Tree Search, reinforcement learning, self-play, preference learning, process supervision, compute-optimal sampling, mutual reasoning, deliberative planning, self-rewarding, uncertainty-aware planning, imagination-based self-improvement, latent-variable inference, value-guided decoding, procedure cloning, and various approaches to scaling and optimizing LLM inference and training, all aimed at improving reasoning, problem-solving, and overall performance across diverse tasks.
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. [[2407.21787v1] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling](https://arxiv.org/abs/2407.21787v1)
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. [[2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters](https://arxiv.org/abs/2408.03314) https://x.com/rohanpaul_ai/status/1835443326205517910
https://x.com/terryyuezhuo/status/1834286548571095299
ReFT: Reasoning with Reinforced Fine-Tuning
[[2401.08967] ReFT: Reasoning with Reinforced Fine-Tuning](https://arxiv.org/abs/2401.08967)
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning [[2402.05808] Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning](https://arxiv.org/abs/2402.05808)
https://x.com/iamgingertrash/status/1834297595486675052
tree search distillation + RL post training! https://x.com/rm_rafailov/status/1834291016192360743
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking](https://arxiv.org/abs/2403.09629)
https://x.com/laion_ai/status/1834564564601729421
Let's Verify Step by Step [[2305.20050] Let's Verify Step by Step](https://arxiv.org/abs/2305.20050) ['Show Your Working': ChatGPT Performance Doubled w/ Process Rewards (+Synthetic Data Event Horizon) - YouTube](https://www.youtube.com/watch?v=hZTZYffRsKI)
[Reddit - Dive into anything](https://www.reddit.com/r/LocalLLaMA/comments/1fgr244/reverse_engineering_o1_architecture_with_a_little/)
[Reverse engineering OpenAI’s o1 - by Nathan Lambert](https://www.interconnects.ai/p/reverse-engineering-openai-o1)
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs [[2406.01297] When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs](https://arxiv.org/abs/2406.01297)
https://x.com/_xjdr/status/1835352391648158189?t=m0K7Xv5AUUDwcZGKM4LcyQ&s=19
https://x.com/sytelus/status/1835433363882270922?t=Y3AXFaiCS7DCKIUiDOhY2w&s=19 o1 developer AMA summary
[Aidan X](https://x.com/aidan_mclau/status/1842550225824809439)
trying my favorite prompts You are the most knowledgeable polymath multidisciplinary scientist that is a perfect generalist and specializes in everything and knows how everything works. Write a gigantic article about all of science from first principles
https://x.com/burny_tech/status/1834334382888218937
quantum gravity
https://x.com/burny_tech/status/1834333794477973821
reimann hypothesis https://x.com/burny_tech/status/1834332769129726038
maps https://x.com/burny_tech/status/1834768077717680135
Will it live up to its hype or be the biggest collective blueball in the history of collective blueballs? It lived to its hype https://x.com/burny_tech/status/1834279980744016180
quote: “Many tasks don’t need reasoning”
Absolutely cooking some jobs lmao https://x.com/burny_tech/status/1834288198731645422
"OpenAI 's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis? AI can be more than chatbots" https://x.com/polynoamial/status/1834280969786065278
How far can scaling go? https://imgur.com/3tDPyT4
jailbreaking got harder You're gonna have to step up your game
@elder_plinius https://x.com/burny_tech/status/1834324442123497924
jailbroken, no model is immune to pliny https://x.com/elder_plinius/status/1834381507978280989
https://x.com/DrJimFan/status/1834284702494327197 https://x.com/burny_tech/status/1834291805690503424
https://x.com/burny_tech/status/1834311367454519515
https://x.com/jam3scampbell/status/1834285523546058973
Level 2 Reasoners are here.
Next up: Level 3 Agents.
https://x.com/SmokeAwayyy/status/1834327038561587279
[Something New: On OpenAI's "Strawberry" and Reasoning](https://www.oneusefulthing.org/p/something-new-on-openais-strawberry)
only 4 years ago the best language model in the world was gpt-2 xl. can you imagine where we might be 4 years from now? https://x.com/willdepue/status/1834309302598971834
The advantage of OpenAI having unthrottled, internal access to o1 cannot be overstated. https://x.com/BenjaminDEKR/status/1834322459354337519 https://x.com/burny_tech/status/1834664364990673017
cursor and claude 3.5 sonnet and replit gets less trendy https://x.com/dkardonsky_/status/1834281667512746468
integration with cursor https://x.com/mckaywrigley/status/1834311328045170862 https://x.com/cursor_ai/status/1834665828308205661
the new o1 model looks amazing but luckily it has a phd level intelligence so our jobs are safe for now https://x.com/netcapgirl/status/1834290758930600069
no more patience, jimmy https://x.com/sama/status/1834276403270857021
You all know what this means: the demand for *fast inference compute* is about to explode. https://x.com/tunguz/status/1834366040437895257
Which lab/team will be the next to release a reasoning AI model? https://x.com/tunguz/status/1834363884326490204
scaling works, situational awareness was right, leopold aschenbrenner was right, Just look at the fucking line! https://x.com/jackgwhitaker/status/1834284617165316434 https://x.com/burny_tech/status/1835365831661740398
"get back to work", "ai is thinking!" https://x.com/yonashav/status/1834325806509949077
The goalposts shall keep moving until the Kardashev scale improves https://x.com/BasedBeffJezos/status/1834292166924943457
stochastic parrots can fly so high
https://x.com/8teAPi/status/1834321503992869177
future models will think for weeks.
don’t die.
https://x.com/iruletheworldmo/status/1834330060205294007
We're only beginning to understand this new paradigm of CoT-LLMs. There're so many new phenomena to study, research on it will be very exciting. You know it's a start of something good when your first model (with extra tuning) gets 93% on AIME’24 and does IOI-level coding :).
https://x.com/lukaszkaiser/status/1834283634888724563
How many startups were wrecked today? https://x.com/tunguz/status/1834324723250970802
oh husbant, you asked gpt-o1-preview model on api to solve the p vs np problem and it thought about it for a week. our api bill shows a usage of $10K USD and now we are homeress https://x.com/dejavucoder/status/1834316507058168091
Hope you guys have strapped your seatbelts. https://x.com/tunguz/status/1834301242656297138
The year is 2027 and OpenAI just dropped AGI, but no one noticed because it was called
gpt-5.5-o3-large2-preview-2027-09-06
https://x.com/tylertracy321/status/1834286741202894985
haha gpus go bitterrr https://x.com/burny_tech/status/1834616064178602321
it's so over bros
it has been 12 hours since openai announced o1 and it has so far failed to to solve
- Riemann hypothesis
- Quantum Gravity
- FTL (Faster Than Light travel)
- P=NP
- Grand Unified Theory
- Cure for cancer
clearly this shows ai has hit a wall and openai is about to go bankrupt
https://x.com/basedjensen/status/1834462070395601094
new paradigm https://x.com/willdepue/status/1834294935497179633
his is what ilya saw, path to AGI https://x.com/WilliamBryk/status/1834614138955526440
It may be that today's large neural networks have enough test time compute to be slightly conscious https://x.com/markchen90/status/1834623248610521523
Intelligence is thermodynamics. https://x.com/BasedBeffJezos/status/1834486894836470199
Everything is thermodynamics.
https://x.com/burny_tech/status/1835424059116675470
The holy reasoning war of nerds https://x.com/burny_tech/status/1834721690271858729
may the God of Scaling be on our side https://x.com/burny_tech/status/1834726584927813827
openai research engineer interviews codign questiosn destroyed by o1 https://x.com/burny_tech/status/1834738026691375381
naming is absurd https://imgur.com/4tauXCt
Deep learning is hitting a wall! (But its a bit more neurosymbolic so props to you Garry!)
falsified gary marcus prediction of no step change this year https://x.com/GaryMarcus/status/1766871625075409381
https://x.com/mealreplacer/status/1834292016462610507
ML street talk says its not true reasoning as they define it percisely (must be turing complete, must acquire and generate new knowledge) https://x.com/MLStreetTalk/status/1834286363476476391 https://x.com/MLStreetTalk/status/1834293397936394726 [Is o1-preview reasoning? - YouTube](https://www.youtube.com/watch?v=nO6sDk6vO0g)
machine qualia will soon become relevant https://imgur.com/3VVGf2H
adding web, math engines and other symblic engines, or for physics, would be even more powerful, perplexity on steroids, AlphaProof on steroids [AI achieves silver-medal standard solving International Mathematical Olympiad problems - Google DeepMind](https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/)
So what is the road to AGI now?
1) Learn all human knowledge and reasoning patterns in distribution, overfit the whole world? With more modalities, which would already be AGI and more, because no single human possesses the sum of all human knowledge and reasoning patterns that you can retrieve from.
2) For more superhuman reasoning performance, more similar RL methods to AlphaZero that require little or zero human input via self-play, which the new OpenAI o1 model partly used by its reward network.
3) Implement more graph of thought iterative reasoning in both training and test-time compute.
4) Synthetic data. Automatic labeling. Massive parallel training in simulations like Nvidia.
5) More scaling.
6) More neurosymbolic approaches like AlphaProof.
,...
it will be compute *and* algorithms *and* data