Most influential LLM papers and the ideas they introduced (post 2017)
https://fxtwitter.com/goyal__pramod/status/1921419933231038820?t=7APGsTpIj_z6DSGeaWIG2w&s=19
o1 like open source model [GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450](https://github.com/NovaSky-AI/SkyThought)
large reasoning models blueprint [[2501.11223] Reasoning Language Models: A Blueprint](https://arxiv.org/abs/2501.11223)
[The 2025 AI Engineering Reading List - Latent.Space](https://www.latent.space/p/2025-papers)
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [[2412.14135] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective](https://arxiv.org/abs/2412.14135)
[GitHub - henrythe9th/AI-Crash-Course: AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks](https://github.com/henrythe9th/AI-Crash-Course)
[GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.](https://github.com/mlabonne/llm-course?tab=readme-ov-file)
[A guide to JAX for PyTorch developers | Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/guide-to-jax-for-pytorch-developers)
[ZenML - LLMOps Database](https://www.zenml.io/llmops-database)
[[2501.07391] Enhancing Retrieval-Augmented Generation: A Study of Best Practices](https://arxiv.org/abs/2501.07391) RAG SoTA
list of autonomous ai agents [GitHub - e2b-dev/awesome-ai-agents: A list of AI autonomous agents](https://github.com/e2b-dev/awesome-ai-agents)
[[2501.09223] Foundations of Large Language Models](https://arxiv.org/abs/2501.09223)
[GitHub - PatWalters/resources_2025: Machine Learning in Drug Discovery Resources 2024](https://github.com/PatWalters/resources_2025)
they put r1 in a loop for 15minutes and it generated: "better than the optimized kernels developed by skilled engineers in some cases"
https://fxtwitter.com/abacaj/status/1889847093046702180?t=aRllQoTGbyzV_E0j4EMAfQ&s=19
[https://www.promptingguide.ai/](https://www.promptingguide.ai/)
https://x.com/arankomatsuzaki/status/1889522977185865833
Competed live at IOI 2024
o3 achieved gold
General-purpose o3 surpasses o1 w/ hand-crafted pipelines specialized for coding resultss
good dissecting of the limitations of Deep Research in practice here:
hallucinations are still big problem,
but it sometimes generates cool useful insights and puts together information nicely and shows you rabbithole paths for you to explore!
[https://www.youtube.com/watch?v=Kqjd_RzhSSY](https://www.youtube.com/watch?v=Kqjd_RzhSSY)
Benchmarks like this showing how R1 actually seems to generalize more poorly compared to OpenAI models makes me think there's more secret sauce behind o1/o3
https://x.com/gm8xx8/status/1888831941161451536?t=UZnBkyEIyiD3TZXUDluzyg&s=19
https://x.com/WenhuChen/status/1888691381054435690
Incredibly exciting work! [https://arxiv.org/abs/2502.03349](https://t.co/aJEGtpF9eL) Cooperative, safe, driving can arise at scale from self-play training without human data, just as my group saw in Overcooked a few years ago. Caveat: in simulation. Bravo @EugeneVinitsky and coauthors!
https://x.com/edwardfhughes/status/1887492625453793471?t=bXqO9PkNQOO_rgcbhCjGFw&s=19
Types of memory in ai agents
https://x.com/Aurimas_Gr/status/1892196166973977034?t=4ioqtVKxIWb7gqSVgmvuWw&s=19
[The State of Reinforcement Learning for LLM Reasoning](https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training)
llm selfplay without human data
https://fxtwitter.com/AndrewZ45732491/status/1919920459748909288
[[2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data](https://arxiv.org/abs/2505.03335)
New Sutton interview about his new paper about superiority of RL [https://www.youtube.com/watch?v=dhfJfQ5NueM](https://www.youtube.com/watch?v=dhfJfQ5NueM)
[https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf](https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf)
https://fxtwitter.com/stenichele/status/1924816238363942966
ARC-NCA shows how Neural Cellular Automata —incl. memory-rich EngramNCA—crack tasks from the ARC-AGI benchmark, hitting GPT-4.5-level accuracy at a fraction of the cost. 🌱🤖
OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve [OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve](https://huggingface.co/blog/codelion/openevolve)
Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning [[2505.16950] Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning](https://arxiv.org/abs/2505.16950)
no verifiers LLM RL
[[2505.21493] Reinforcing General Reasoning without Verifiers](https://arxiv.org/abs/2505.21493)
[[2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data](https://arxiv.org/abs/2505.03335)
[[2505.19590] Learning to Reason without External Rewards](https://arxiv.org/abs/2505.19590)
https://x.com/natolambert/status/1927404027735617541
Introducing Continuous Thought Machines https://fxtwitter.com/SakanaAILabs/status/1921749814829871522
sakana.ai/ctm/
fully decentralized and open source [INTELLECT-2 Release: The First Globally Trained 32B Parameter Model Reinforcement Learning Training Run](https://www.primeintellect.ai/blog/intellect-2-release)
Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack [Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack](https://www.primeintellect.ai/blog/inference)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [[2402.07754] Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models](https://arxiv.org/abs/2402.07754)
https://fxtwitter.com/stenichele/status/1924816238363942966
ARC-NCA shows how Neural Cellular Automata —incl. memory-rich EngramNCA—crack tasks from the ARC-AGI benchmark, hitting GPT-4.5-level accuracy at a fraction of the cost.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models [[2505.24864] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models](https://arxiv.org/abs/2505.24864)
[Whitepaper: Mastering Gameworld 10k in Minutes with the AXIOM ‘Digital Brain’](https://www.verses.ai/blog/whitepaper-mastering-gameworld-10k-in-minutes-with-the-axiom-digital-brain)
Atlas (A powerful Titan): a new architecture with long-term in-context memory https://fxtwitter.com/behrouz_ali/status/1928522388100010383
[[2505.23735] ATLAS: Learning to Optimally Memorize the Context at Test Time](https://arxiv.org/abs/2505.23735)
https://fxtwitter.com/LingYang_PU/status/1925385712670830753 [[2505.15809] MMaDA: Multimodal Large Diffusion Language Models](https://arxiv.org/abs/2505.15809)
We present MMaDA, first diffusion that unifies text reasoning, multimodal understanding, and image generation through Mixed Long-CoT, and unified RL - UniGRPO.