Most influential LLM papers and the ideas they introduced (post 2017) https://fxtwitter.com/goyal__pramod/status/1921419933231038820?t=7APGsTpIj_z6DSGeaWIG2w&s=19 o1 like open source model [GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450](https://github.com/NovaSky-AI/SkyThought) large reasoning models blueprint [[2501.11223] Reasoning Language Models: A Blueprint](https://arxiv.org/abs/2501.11223) [The 2025 AI Engineering Reading List - Latent.Space](https://www.latent.space/p/2025-papers) Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [[2412.14135] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective](https://arxiv.org/abs/2412.14135) [GitHub - henrythe9th/AI-Crash-Course: AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks](https://github.com/henrythe9th/AI-Crash-Course) [GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.](https://github.com/mlabonne/llm-course?tab=readme-ov-file) [A guide to JAX for PyTorch developers | Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/guide-to-jax-for-pytorch-developers) [ZenML - LLMOps Database](https://www.zenml.io/llmops-database) [[2501.07391] Enhancing Retrieval-Augmented Generation: A Study of Best Practices](https://arxiv.org/abs/2501.07391) RAG SoTA list of autonomous ai agents [GitHub - e2b-dev/awesome-ai-agents: A list of AI autonomous agents](https://github.com/e2b-dev/awesome-ai-agents) [[2501.09223] Foundations of Large Language Models](https://arxiv.org/abs/2501.09223) [GitHub - PatWalters/resources_2025: Machine Learning in Drug Discovery Resources 2024](https://github.com/PatWalters/resources_2025) they put r1 in a loop for 15minutes and it generated: "better than the optimized kernels developed by skilled engineers in some cases" https://fxtwitter.com/abacaj/status/1889847093046702180?t=aRllQoTGbyzV_E0j4EMAfQ&s=19 [https://www.promptingguide.ai/](https://www.promptingguide.ai/) https://x.com/arankomatsuzaki/status/1889522977185865833 Competed live at IOI 2024 o3 achieved gold General-purpose o3 surpasses o1 w/ hand-crafted pipelines specialized for coding resultss good dissecting of the limitations of Deep Research in practice here: hallucinations are still big problem, but it sometimes generates cool useful insights and puts together information nicely and shows you rabbithole paths for you to explore! [https://www.youtube.com/watch?v=Kqjd_RzhSSY](https://www.youtube.com/watch?v=Kqjd_RzhSSY) Benchmarks like this showing how R1 actually seems to generalize more poorly compared to OpenAI models makes me think there's more secret sauce behind o1/o3 https://x.com/gm8xx8/status/1888831941161451536?t=UZnBkyEIyiD3TZXUDluzyg&s=19 https://x.com/WenhuChen/status/1888691381054435690 Incredibly exciting work! [https://arxiv.org/abs/2502.03349](https://t.co/aJEGtpF9eL) Cooperative, safe, driving can arise at scale from self-play training without human data, just as my group saw in Overcooked a few years ago. Caveat: in simulation. Bravo @EugeneVinitsky and coauthors! https://x.com/edwardfhughes/status/1887492625453793471?t=bXqO9PkNQOO_rgcbhCjGFw&s=19 Types of memory in ai agents https://x.com/Aurimas_Gr/status/1892196166973977034?t=4ioqtVKxIWb7gqSVgmvuWw&s=19 [The State of Reinforcement Learning for LLM Reasoning](https://magazine.sebastianraschka.com/p/the-state-of-llm-reasoning-model-training) llm selfplay without human data https://fxtwitter.com/AndrewZ45732491/status/1919920459748909288 [[2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data](https://arxiv.org/abs/2505.03335) New Sutton interview about his new paper about superiority of RL [https://www.youtube.com/watch?v=dhfJfQ5NueM](https://www.youtube.com/watch?v=dhfJfQ5NueM) [https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf](https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf) https://fxtwitter.com/stenichele/status/1924816238363942966 ARC-NCA shows how Neural Cellular Automata —incl. memory-rich EngramNCA—crack tasks from the ARC-AGI benchmark, hitting GPT-4.5-level accuracy at a fraction of the cost. 🌱🤖 OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve [OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve](https://huggingface.co/blog/codelion/openevolve) Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning [[2505.16950] Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning](https://arxiv.org/abs/2505.16950) no verifiers LLM RL [[2505.21493] Reinforcing General Reasoning without Verifiers](https://arxiv.org/abs/2505.21493) [[2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data](https://arxiv.org/abs/2505.03335) [[2505.19590] Learning to Reason without External Rewards](https://arxiv.org/abs/2505.19590) https://x.com/natolambert/status/1927404027735617541 Introducing Continuous Thought Machines https://fxtwitter.com/SakanaAILabs/status/1921749814829871522 sakana.ai/ctm/ fully decentralized and open source [INTELLECT-2 Release: The First Globally Trained 32B Parameter Model Reinforcement Learning Training Run](https://www.primeintellect.ai/blog/intellect-2-release) Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack [Planetary-Scale Inference: Previewing our Peer-To-Peer Decentralized Inference Stack](https://www.primeintellect.ai/blog/inference) Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models [[2402.07754] Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models](https://arxiv.org/abs/2402.07754) https://fxtwitter.com/stenichele/status/1924816238363942966 ARC-NCA shows how Neural Cellular Automata —incl. memory-rich EngramNCA—crack tasks from the ARC-AGI benchmark, hitting GPT-4.5-level accuracy at a fraction of the cost. ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models [[2505.24864] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models](https://arxiv.org/abs/2505.24864) [Whitepaper: Mastering Gameworld 10k in Minutes with the AXIOM ‘Digital Brain’](https://www.verses.ai/blog/whitepaper-mastering-gameworld-10k-in-minutes-with-the-axiom-digital-brain) Atlas (A powerful Titan): a new architecture with long-term in-context memory https://fxtwitter.com/behrouz_ali/status/1928522388100010383 [[2505.23735] ATLAS: Learning to Optimally Memorize the Context at Test Time](https://arxiv.org/abs/2505.23735) https://fxtwitter.com/LingYang_PU/status/1925385712670830753 [[2505.15809] MMaDA: Multimodal Large Diffusion Language Models](https://arxiv.org/abs/2505.15809) We present MMaDA, first diffusion that unifies text reasoning, multimodal understanding, and image generation through Mixed Long-CoT, and unified RL - UniGRPO.