DeepSeek-R1 explored using MCTS, recognizing its potential advantages, but they couldn't make it work due to scaling and other challenges
https://github.com/.../DeepSeek-R1/blob/main/DeepSeek_R1.pdf
https://x.com/burny_tech/status/1881442503435686225
The bitter lesson is showing up again and again in its many forms
"nooo you can't just scale up CoT with simple RL and scalar rewards without using a PRM or tree search or optimizing latent reasoning"
https://x.com/hallerite/status/1881340953610616988
The core equation (already used in DeepSeekMath)
https://x.com/burny_tech/status/1881817204087501203
I wonder how similar is OpenAI o3's reinforcement learning objective to DeepSeek-R1's objective
číňani vymysleli hezkou inovaci když se pokusili ušetřit, plus je API toho jejich modelu je 30x levnější než OpenAI's o1 za v podstatě stejný performance dle benchmarků (v některých benchmarcích trochu lepší, v jiných trochu horší, ale jenom o trochu)
plus ten novej čínskej model na úrovni o1 je zadarmo tady: https://chat.deepseek.com/
deepseek pořád není na úrovni o3, a fakt by mě zajímalo jestli má Čína něco lepšího v closed sourcu
tohle závod dle mě závod o superinteligenci mezi USA a Čínou ještě akceleruje, protože aby OpenAI důstali dominant, tak musí co nejvíc využít jejich first mover advantage a jet rychlejc, když je všichni dohání (včetně Googlu a Anthropicu), pokud fakt nemají nějakej větší moat 😄
Sam Altman byl na Trump ceremonii, a teď bude ve white housu ukazovat nějakej další AI model, zajímalo by mě či tím myslí o3, toho jejich computer operatora, nebo jiný [https://www.youtube.com/watch?v=59Etzj5gvsE](https://www.youtube.com/watch?v=59Etzj5gvsE)
navíc Trump hned první den odstranil AI regulace co plánoval Biden https://www.reuters.com/technology/artificial-intelligence/trump-revokes-biden-executive-order-addressing-ai-risks-2025-01-21/
where to invest?
dle mě záleží jak rychle se saturuje scaling týhle family of RL breakthoughs co se poslední měsíce aplikuje, a za jak dlouho nastane další breakthrough/s
ale dávám největší pravděpodobnost (pocit, protože to je dle mě fundamentally unpredictable protože chaotický complexní systémy jsou bordel), že ten graf bude konzistetně exponenciála tvořená ze sigmoidů
assuming that scenario, tak mám myslím že půjde nahoru celej tenhle AI supply chain, od hardwaru po data po software
ale už teď je to v některých částí truhu dle mě bublina, protože některý expectations investorů jsou fakt nerealistic (superinteligence za 5 nanosekund), která má velkou šanci prasknutí, podobně jako dot com bubble
and assuming we won't find a next breakthrough for too long, čemu dávám malou nenulovou pravděpodobnost, everything will crash
dle mě skoro obvious jsou některý openai wrapper startups co dle mě korporace steamrollnou a smažou z existence
např u Nvidie jsem tak extrémně nejistej, tím jak nevím jak moc overvalued reálně jsou relativně k dnešku a relativně k bližším a long term možných scénarům budoucnosti, plus hardware na lineární algebru má hodně využití
a třeba sleeping giant Google wakes up and eats OpenAI raw
nebo Illia's SSI (Safe Superintelligence inc.) suddenly enters with some extreme AI breakthrough, vzhledem k tomu, že týpek byl u dost AI breakthoughs v minulosti
nebo možná Anthropic setře OpenAI
nebo Čína něco ultra velkýho crackne behind closed doors, a polovina galaxie nepůjde Americe a polovina Číně, ale veškerá galaxie půjde Číně (super extreme long term scenario 😄 )
Ještě jsem zapomněl na prezidenta Muska a jeho Xai+Tesla robotics, a Metu, a jiný robotics companies, a další činský AI+robotics companies, a ta troška evropských
I still absolutely don't grasp that with DeepSeek-R1-Zero things like self-verification, self-correction, reflection, long chains of thought, exploration of alternative approaches, etc. are emergent due to reinforcement learning... no specialized training on human data regarding these reasoning patterns was done (supervised finetuning), prior to that only knowledge from the base model was available at the level of the previous AI paradigm... and the result is crazy leaps on math and coding... that's a big breakthrough!
https://x.com/burny_tech/status/1881883026050748641
I stopped believing it as some point when i saw some initial attempts.
I thought more sophisticated neurosymbolic systems are the future when i looked at AlphaProof when it came out and seemed to work better at first.
Then OpenAI's o1 came out, many people thought it has some MCTS etc., but it was leaked that nope, apparently just mainly RL on CoTs with LLM with emergence. More employees confirmed it now. https://x.com/paul_cal/status/1882111659927556535
And then suddenly, DeepSeek-R1, just simple RL, no MCTS, no PRM, with emergent behaviour, very similar to o1 series in terms of just pure RL.
https://x.com/burny_tech/status/1881883026050748641
https://x.com/burny_tech/status/1881459096655991146
The bitter lesson is showing up again and again in its many forms
https://x.com/burny_tech/status/1881746966042009983
o1 might be using pure RL like DeepSeek-R3, but we still don't know what exact RL equation and reward function o1 is using
https://x.com/paul_cal/status/1882111659927556535
DeepSeek na ARC benchmarku: mezi o1-preview a o1 https://x.com/GregKamradt/status/1881762305152872654
ah here we goooooo, that was fast https://www.reuters.com/technology/artificial-intelligence/trump-announce-private-sector-ai-infrastructure-investment-cbs-reports-2025-01-21/
[https://www.youtube.com/watch?v=IYUoANr3cMo](https://www.youtube.com/watch?v=IYUoANr3cMo)
https://openai.com/index/announcing-the-stargate-project/
https://x.com/rohanpaul_ai/status/1881839146466984086
https://x.com/OpenAI/status/1881830103858172059
Trump $500 billion in AI infrastructure with OpenAI
this is on the scale of Apollo program and Manhattan project
166 out of 195 countries in the world have a GDP smaller than this investment
Tons of things in papers that aren't mathematics, code, or benchmarks, is often slop
random Chinese quants on their way to make just a side project to destroy the competitive moat of the biggest American technological giants by giving the AI to everyone for free
at least prices of AI are falling rapidly as a result
https://x.com/burny_tech/status/1882528618779464044
meta in panic mode, scarcity incentivizes innovation
https://x.com/burny_tech/status/1882526703303655850
Elon: Rizzes Trump so hard that he basically becomes president with Trump, in major part to destroy all competitors of his companies, including OpenAI
Trump: Also, fuck you Elon
Trump: We're announcing 0.5 trillion $ private investment for AI infrastructure with OpenAI that you hate, the largest single technology funding initiative in the history of the world so far, which is more than projects like Apollo program and Manhattan project, also when you take in account stuff like inflation
Elon: Bro what the fuck is this, I want those 0.5 trillion dollars for my AI infrastructure instead, didn't you see my currently gigantic AI datacenters and my new Grok reasoner model?
And also random Chinese quants on their way to make just a simple AI side project at the fraction of the cost to destroy the competetive moat of the biggest American technological giants by giving it to everyone for free
The Stargate situation is crazy... Elon vs Altman beef intensifies [https://www.youtube.com/watch?v=YrHsw4Oja7w](https://www.youtube.com/watch?v=YrHsw4Oja7w)
Pure Graph-of-Agents RL
2025 is the year of scaling reinforcement learning with LLMs to the sky to turn them into large reasoning models.
Chains of thoughts. Graphs of thoughts. Graphs of agents. Ecosystems of agents. And so on.
There will be so much more emergence.
>AI will never do x
Ok but AI just did it
>That's trivial, obvious, easy task for computers to do, but AI will never do x
Ok but AI just did it
>That's trivial, obvious, easy task for computers to do, but AI will never do x
Ok but AI just...
https://x.com/fchollet/status/1883976186038272043
deepseek doesn't mean the end for nvidia
ale myslím že teď dost lidí misunderstanduje co to znamená pro nvidii
1) inference stojí šílený prachy, někteří citují že se víc čipů kupuje kvůli inferenci
2) u trénování je large scale scaling of reinforcement learning (což se dělá u tohoto novýho paradigmatu) overall ještě víc expensive než klasický škálování (human generates data and machine learns -> machine generates data and machine learns)
3) the more compute efficient and accessible AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need, since people will want it more and more
https://x.com/fchollet/status/1883976186038272043
DeepSeek doesn't mean the end for Nvidia!
I think a lot of people misunderstand what that means for Nvidia right now.
1) inference costs crazy money, more chips are being bought because of inference instead of training
2) for training, large scale scaling of reinforcement learning (which is what is done in this new paradigm) overall is even more expensive than classical scaling (human generates data and machine learns -> machine generates data and machine learns)
3) the more compute efficient and accessible AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need, since people will want it more and more
Btw myslím že je to trochu overblown, ten čínskej model podle dost nekontaminovaných benchmarků pořád horší než o1 od OpenAI
A nějakou distillaci určitě dělali, ale to je minimální část toho trénování
A to teď dělají všechny AI companies 😄
Ale většina trénování je v tom reinforcement learningu, který se netrenuje na těch lidských nebo syntetických datech
Už vznikají replikace
A ta majoritní část trénování s reinforcement learningem bez těch lidských a AI dat je ten důvod proč je to tak big
Tipuju že mainstream se tohodle tak chytl protože DeepSeek dal o něco horší model než o1 ale zadarmo, takže normíci o1 level do teď neznali
Jako reakci do 24 hodin OpenAI spustí free model na úrovni o1 btw (o3 mini)
zároveň furt vidím lidi psát rip nvidia kvůli tomuhle protože už nejsou potřeba čipy
to mi ale nedává smysl už vůbec
1) inference stojí šílený prachy, víc čipů se kupuje kvůli inferenci než kvůli trénování
2) u trénování je large scale scaling of reinforcement learning (což se dělá u tohoto novýho paradigmatu) overall ještě víc expensive než klasický škálování (human generates data and machine learns -> machine generates data and machine learns)
3) the more compute efficient and accessible AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need, since people will want it more and more (což se už dejě s Anthropicem, co mají přístup k big tech GPUs, a servují taky reasoning model co byl cheap na training)
[https://www.youtube.com/watch?v=hpwoGjpYygI](https://www.youtube.com/watch?v=hpwoGjpYygI)
Dost researchers si myslí že ten "stolen data" claim cope od OpenAI co teď všichni berou at face value je dost nepradvěpodobnej.
Většina trénování je v tom reinforcement learningu, který se netrenuje na těch lidských nebo syntetických datech, ta majoritní část trénování pomocí reinforcement learningu bez těch lidských a AI dat je ten důvod proč je to tak big.
Deepseek R1's original paper shows how they're using pure reinforcement learning via GPRO. This is different from previous approaches which either require a human to rate the outputs, or example outputs. It doesn't need external training data.
Už vznikají replikace co pomalu confimují že to funguje.
Je možný že distilaci použili na část toho trénovaní nebo na celý trénování, ale vzhledem k tomu, že vznikají replikace, kde tu emergenci díky reinforcement learningu lidi taky pozorují, tak to co v jejich paperu zmínili vypadá že reálně funguje. DeepSeek na konci dělal ještě trochu supervised finetuningu na alignment, co tam mohl nacpat to, že někdy outputuje, že je ChatGPT, protože to bylo ve veřejných datech. OpenAI employees taky přiznali že DeepSeek přišel na velkou část základu o1. Navíc distilaci z frontier modelů teď v nějaký míře pravděpodobně dělají v podstatě všechny AI companies, protože to je nejjednoduší na lepší performance, proto si tolik LLMs myslí že jsou ChatGPT, ale to může být zároveň tím, že public datasety na huggingface a internet je teď plnej ChatGPT.
The joy of open research, where you can't fake stuff, because others will easily prove you wrong through replications.
Ale je pravda že ten hype dává alespoň smysl v tom, že rozbili velkou část moatu OpenAI s trouchu horším modelem než OpenAI's o1 (dle milion nekontaminovaných private benchmarků) za zlomek ceny (i když Gemini thinking od Googlu je taky o něco horší než o1, a je ještě levnější než DeepSeek API, ale je closed source), a co je open source a jde pustit locally.
Tipuju že mainstream se tohodle tak chytl z velký části protože DeepSeek dal o něco horší model než o1 ale zadarmo, takže normíci o1 level modely do teď neznali.
Zároveň furt vidím lidi psát rip nvidia kvůli tomuhle protože už nejsou potřeba čipy, to mi ale nedává smysl už vůbec:
1) Inference stojí šílený prachy, víc čipů se kupuje kvůli inferenci než kvůli trénování
2) U trénování je large scale scaling of reinforcement learning (což se dělá u tohoto novýho paradigmatu) overall ještě víc expensive než klasický škálování (human generates data and machine learns -> machine generates data and machine learns), a scaling laws tu zatím vůbec nehitly limit (see benchmark o3 results and associated costs)
3) The more compute efficient and accessible AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need, since people will want it more and more (což se už dejě s Anthropicem, co mají přístup k big tech GPUs, a servují taky částečně reasoning model co byl cheap na training)
"
A lot of researchers think that the "stolen data" claim cope from OpenAI that everyone is now taking at face value is pretty unlikely.
Most of the training is in that reinforcement learning that doesn't train on that human or synthetic data. The majority part of training with reinforcement learning without that human and AI data is why it's so big!
Deepseek R1's original paper shows how they're using pure reinforcement learning via GPRO. This is different from previous approaches which either require a human to rate the outputs, or example outputs. The most important part of the training pipeline doesn't need external training data.
There are already replications that are slowly confirming that this method works.
It's possible that they used distillation for part or for all of the training, but since replications are already happening where people are also observing the emergence due to reinforcement learning, then what DeepSeek mentioned in their paper seems to work. OpenAI employees also admitted that DeepSeek in big part figured out major parts of how o1 was made. DeepSeek was still doing a little bit of supervised finetuning on alignment at the end, which is a pretty small part of that training pipeline, where they could put in the model that it sometimes thinks that it's ChatGPT, because it was in the data, that they could get both from that distillation, or from public huggingface datasets or the internet, which is now full of ChatGPT output. Plus distillation from frontier models is probably done to some extent by pretty much all AI companies now because it's the easiest to do for better performance, which is why so many LLMs think they are ChatGPT, but that could also be because the public huggingface datasets and the internet is now full of ChatGPT.
The joy of open research, where you can't fake stuff, because others will easily prove you wrong through replications.
But it's true that the hype at least makes sense in that they shattered a large part of the OpenAI moat with a slightly worse model than OpenAI's o1 (according to a million uncontaminated private benchmarks) at a fraction of the cost (although Google's Gemini thinking is also slightly worse than o1, and is even cheaper than DeepSeek API, but is closed source), and that it's open source and can be ran locally.
I'm guessing it went so viral in mainstream in large part because DeepSeek gave a slightly worse model than o1 but for free, so the normies didn't know o1 level models until now.
At the same time I keep seeing people post "rip Nvidia" because of this, because they don't need the chips anymore, but that doesn't make sense to me:
1) Inference costs a lot of money, more chips are bought for inference than for training
2) For training, large scale scaling of reinforcement learning (which is done in this new paradigm) is overall even more expensive than previous scaling paradigm (human generates data and machine learns -> machine generates data and machine learns), and scaling laws here have not yet hit a limit at all (see benchmark o3 results and associated costs)
3) The more compute efficient and accessible AI gets, the more use cases start becoming economically viable, the more we'll deploy AI, and the more compute we'll need, since people will want it more and more (which is already happening with Anthropic, who have access to big tech GPUs, and also serve a partial reasoning model what was cheap on training)
"
Suppose we model the entire universe with an autoregressive transformer playing reinforcement learning with itself
Jaká matika se používá v AI?
Nejvíc se tam používá linerání algebra, matematická analýza více proměnných, teorie pravděpodobnosti, statistika, teorie informace, teorie optimalizace (což dává dohromady např trénování neuronek aneb hluboký učení, a taky reinforcement learning a starší statistický modely), statistická mechanika (diffusion modely), teorie grafů (graph neural networks), teoretická computer science (theory of computation analysis), control theory, ale jsou pak jsou diverznější podobory AI co do toho dávají ještě víc 😄 spíš mám problém najít aplikovanou matiku co se nějak v AI nějak ještě nepoužila, např z fyziky tam overflowuje tuna 😄
ať na praxi nebo na teorii, kde v teorii se např využívá ještě ta teorie grup (na klasifikaci architektur) a teorie kategorií, a víc statistický mechaniky (deep learning theory), teorie množin, algebraická geometrie (Singular learning theory [Neural networks generalize because of this one weird trick — LessWrong](https://www.lesswrong.com/s/mqwA5FcL6SrHEQzox/p/fovfuFdpuEwQzJu2w) ), topologie (topological data analysis), apod. 😄
Nerds are fighting that AI can't solve Riemann hypothesis or invent quantum mechanics from scratch, while normies are happy that it can help them solve simple algebraic equations that they struggled with in school
It's interesting how some people love to do highly overconfident technical claims about AI systems without any technical knowledge about them, and when I show them papers showing that it's false, they stop responding
if you use any of the newest LLMs for STEM or other fields, you notice how they are starting to more and more give sources, even when they're not connected to the internet (still ofc depends on the LLM)
Source attribution is a big growing field
I sometimes wish I could do this for my brain too :D
It would be super neat to see the graph of as many of the approximated environmental evolutionary incentives that shaped the very neural dynamics leading to the emergence of some complex thought, and not just see those incentives that I can recall myself, in many cases too fuzzily :D