Links AI SOTA practice

deepseek o1 [DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1 · GitHub](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) and [Kimi-k1.5/Kimi_k1.5.pdf at main · MoonshotAI/Kimi-k1.5 · GitHub](https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf) upcoming model releases [Your connected workspace for wiki, docs & projects | Notion](https://koltregaskes.notion.site/modelreleases) best llm papers from past 2 years https://x.com/Teknium1/status/1865792338666348671 [[2303.18223] A Survey of Large Language Models](https://arxiv.org/abs/2303.18223) AlphaZero from scratch [https://www.youtube.com/watch?v=wuSQpLinRB4](https://www.youtube.com/watch?v=wuSQpLinRB4) [MuZero - Wikipedia](https://en.wikipedia.org/wiki/MuZero) Deepseek V3 technical report [🚀 Introducing DeepSeek-V3 | DeepSeek API Docs](https://api-docs.deepseek.com/news/news1226) https://x.com/nrehiew_/status/1872318161883959485?t=SB9rLEF4MoCy4Gt9FBVicQ&s=19 [The 2025 AI Engineering Reading List - Latent.Space](https://www.latent.space/p/2025-papers) AI trends and their future by Andrew Ng. I'm curious how reasoning systems, agents and multiagent systems will develop in 2025. [https://www.youtube.com/watch?v=KrRD7r7y7NY](https://www.youtube.com/watch?v=KrRD7r7y7NY) In one small private benchmark for actual real world industry usecase of ReAct agents with complex multistep tool use interacting with APIs, Claude 3.5 Sonnet New as a base model in an agent framework absolutely crushes GPT-4o and Gemini 2 Flash. Sonnet can actually do some tasks repeatedly while others repeatedly fail miserably. I should try o1 and eventually o3 too. [GitHub - henrythe9th/AI-Crash-Course: AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks](https://github.com/henrythe9th/AI-Crash-Course) [GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.](https://github.com/mlabonne/llm-course?tab=readme-ov-file) [[2501.04519] rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking](https://arxiv.org/abs/2501.04519) 8B open source model rivaling o1 on math benchmarks [[2501.04227] Agent Laboratory: Using LLM Agents as Research Assistants](https://arxiv.org/abs/2501.04227) [Explainable artificial intelligence to identify follicles that optimize clinical outcomes during assisted conception | Nature Communications](https://www.nature.com/articles/s41467-024-55301-y#Sec10) "our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window" [[2501.08313] MiniMax-01: Scaling Foundation Models with Lightning Attention](https://arxiv.org/abs/2501.08313) New Google's Titans AI architecture is better at long context thanks to better memory mechanism. "Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines." [[2501.00663v1] Titans: Learning to Memorize at Test Time](https://arxiv.org/abs/2501.00663v1) [ZenML - LLMOps Database](https://www.zenml.io/llmops-database) speech language models can talk while listening [Introducing hertz-dev - Standard Intelligence](https://si.inc/hertz-dev/) Reinforcement Learning from Hindsight Simulation (RLHS) tackles this by simulating the user making a decision based on the AI's advice and then experiencing the outcome, rather than relying solely on immediate feedback. >increased truthfulness/reduced hallucinations https://x.com/sebkrier/status/1880759033658843488 [[2501.09136] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG](https://arxiv.org/abs/2501.09136) [GitHub - asinghcsu/AgenticRAG-Survey: Agentic-RAG explores advanced Retrieval-Augmented Generation systems enhanced with AI LLM agents.](https://github.com/asinghcsu/AgenticRAG-Survey) uncensored models Uncensored General Intelligence Leaderboard [UGI Leaderboard - a Hugging Face Space by DontPlanToEnd](https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard) [Reddit - The heart of the internet](https://www.reddit.com/r/LocalLLaMA/comments/1ge2fzf/llm_recommendation_for_erotic_roleplay/) [Reddit - The heart of the internet](https://www.reddit.com/r/Oobabooga/comments/1fp629k/which_are_good_roleplay_llm_models_for_nsfw/) https://boards.4chan.org/g/catalog#s=lmg