Thoughts AI technical 8

" I am definitely against OpenAI (or any other entity) using AI (or any other technology) to concentrate power just for themselves and I think that they should empower everyone as much as possible instead. But personally I don't really like this currently popular narrative of AI systems "being just plagiarizers". I think of AI systems as partial memorizers and partial generalizers, depending the exact details (from mathematical deep learning theory), which is what humans do similarly, but still with differences that we're slowly but surely mapping out. I feel like this narrative extremely downplays what the current AI systems are already capable of, like for example if FunSearch only "plagiarized the training data" and didn't have at least some sort of generalization power (which all deep learning systems do have, otherwise they would never generalize to unseen datasets at all, see bias and variance trade off in statistical learning theory to see the trade off between memorization and generalization), then it wouldn't help to find new result in mathematics. [FunSearch: Making new discoveries in mathematical sciences using Large Language Models - Google DeepMind](https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/) And its not bruteforce as well because you have insane combinatorial explosion if you try just bruteforce. And we wouldn't see all these abstract features and circuits emerging in LLMs with scale like in Claude or other deep learning systems, if they only memorized the training data without generalization. https://www.anthropic.com/research/mapping-mind-language-model https://x.com/tegmark/status/1851288315867041903?t=eB9Ft7hF9ocV9s-w3s-O1w&s=19 [[2410.19750] The Geometry of Concepts: Sparse Autoencoder Feature Structure](https://arxiv.org/abs/2410.19750) [Zoom In: An Introduction to Circuits](https://distill.pub/2020/circuits/zoom-in/) Or AlphaZero and similar chess machine learning systems wouldn't be better than all humans in chess. Or AlphaFold wouldn't help with pushing state of the art in protein folding. And so on. They are all deep learning systems. The new reasoning models go even beyond, using reinforcement learning to enforce even more generalization, and we're starting to reverse engineer the mechanisms behind that. [[2501.17161] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training](https://arxiv.org/abs/2501.17161) https://x.com/_philschmid/status/1884983965112828051 This memorization and generalization trade off and how generalization happens in the first place from abstracting individual memorized units is also studied in human learning. There are many differences between human and machine learning, but also many similarities. [How do we generalize? - PMC](https://pmc.ncbi.nlm.nih.gov/articles/PMC7613724/) [[2205.10343] Towards Understanding Grokking: An Effective Theory of Representation Learning](https://arxiv.org/abs/2205.10343) [Neural networks generalize because of this one weird trick — LessWrong](https://www.lesswrong.com/s/mqwA5FcL6SrHEQzox/p/fovfuFdpuEwQzJu2w) " I love people that don't really know anything about technicalities of AI making strong technical claims about AI and then give completely no response when I ask them to elaborate the technicalities. Probably because they just parrot what they heard somewhere, instead of actually using AI and understanding and doing the math and coding of AI themselves? In multiagent scenario Claude is fun as code review agent paired with Claude coding agent. Code review agent gives infinite ways to improve the code so the coding agent never rests, there are always infinite improvements to do. https://openai.com/index/introducing-deep-research/ [smolagents/examples/open_deep_research at main · huggingface/smolagents · GitHub](https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research) dneska jsem zrovna celej den řešil smolagents 😄 mají cool agenty co přemýšlí v kódu zajímavý jak tahle open-source verze deep researchu co se pokusila replikovat OpenAI's deep research scores 54% on the same validation set OpenAI's scored 67% ale úplně replikace to není, protože nevidím žádný reinforcement learning trénování, což OpenAI dělal, použili přes reinforment learning trained unrelesed o3 na tenhle typ tasku a tohle je dost insane sample efficiency jump 😄 [[2501.19393] s1: Simple test-time scaling](https://arxiv.org/abs/2501.19393) [Imgur: The magic of the Internet](https://imgur.com/lD6Mtaa) " >The use of A.I. led to 29% higher detection of cancer, no increase of false positives, and reduced workload compared with radiologists without A.I. https://fxtwitter.com/EricTopol/status/1886610132567777514 ten systém za tím používá deep learning nemůžu najít větší detaily, ale konkrétněji tipuju konvoluční neurální sítě, který jsem v tomhle usecasu viděl nejvíc, nebo vision transformery, nebo ensemble/hybrid metody [Fine tuning deep learning models for breast tumor classification | Scientific Reports](https://www.nature.com/articles/s41598-024-60245-w) CNNs jsou mají celkem intersekce s tím jak informace zpracovává vizuální kortex ukazuje to, že deep learning jde použít i na věci co aktivně zlepšují lidem život přes lepší medicínu, což si myslím, že byse mělo dostat víc do kolektivního vědomí tohle osobně v mý oblíbený terminologii klasifikuju jako special case relativně narrow inteligence, co má sice relativně obecnou architekturu, co je ale hodně specifically trained na jeden typ tasku in specialized way tady jsem zhruba popsal high level moji oblíbenou terminologii kolem AI ale např když vezmeš Cholletovu definici intelligence, který ji v podstatě dává do ekvivalence s generalization power nebo adaptation to novelty, tak čím víc nějaký systém je specialized, tím míň intelignetní je no 😄 tyhle víc specialized deep learning systémy alespoň nějak relativně zobecňujou v jejich narrow doméně třeba by tam něco na způsob těch moderních reinforcement learning metod z large reasoning models taky nějak pomohlo k lepšímu zobecňování a adaptaci i v tý narrow doméně, hmmm zatím jsem viděl lidi aplikovat inference time compute ještě na diffusion modely na image generation [A General Framework for Inference-time Scaling and Steering of Diffusion Models](https://arxiv.org/abs/2501.06848) jsou nějaké architektury, kde se nerozlišuje mezi train a inference fázemi? Test time adaptation paradigm kind of, tím že se při inferenci zároveň specializovaně trénuje na ten task? Ale furt se i trénuje před tím. Nebo možná, liquid neural networks dělá obojí zároveň. Online/continuous/continual learning. Hmm tohle je dobrej thread [Reddit - The heart of the internet](https://www.reddit.com/r/MachineLearning/comments/113448t/d_is_anyone_working_on_ml_models_that_infer_and/) [Reddit - The heart of the internet](https://www.reddit.com/r/ArtificialInteligence/comments/17p807r/is_there_any_ai_or_model_which_can_do_both/) i think the future of general AI systems is neurosymbolic multimodal hybrids of all sorts of AI architectures, where deep learning and its ancestors will be part of it but not the whole story at all pokud chceme fakt co největší obecnost do jistý míry tím směrem jdeme modernější obecnější systémy mají menší a menší procento celkový architektury deep learning, plus je tam víc a víc reinforcement learning Chollet tímhle směrem taky dost argumentuje [https://www.youtube.com/watch?v=w9WE1aOPjHc](https://www.youtube.com/watch?v=w9WE1aOPjHc) ale k tomu moje predikce je že něco na způsob neuromorphic computing nebo liquid neural networks (víc biology+physics based), víc kombinovaný se symbolikou, bude do 5 let mnohem větší součást AI LiquidAI dělá na architektuře založený na liquid neural networks [Liquid AI: Build capable and efficient general-purpose AI systems at every scale.](https://www.liquid.ai/) " I think the future of general AI systems is neurosymbolic multimodal hybrids of all sorts of AI architectures, where deep learning and its ancestors will be part of it but not the whole story at all. To some extent, that's the direction we're going in. More modern generic systems have a smaller and smaller percentage of deep learning in their overall architecture. Plus there is more and more reinforcement learning being used. My prediction is that something along the lines of neuromorphic computing or liquid neural networks (more biology+physics based AI), combined with symbolics, will be a much bigger part of AI within 5 years. Chollet argues for neurosymbolic AI more. LiquidAI is working on an architecture based on liquid neural networks. [https://www.youtube.com/watch?v=w9WE1aOPjHc](https://www.youtube.com/watch?v=w9WE1aOPjHc) [Liquid Foundation Models: Our First Series of Generative AI Models](https://www.liquid.ai/liquid-foundation-models) [https://www.youtube.com/watch?v=3MkJEGE9GRY](https://www.youtube.com/watch?v=3MkJEGE9GRY) " Robotika. Koukám hlavně na to co už se teď používá v praxi. Mám na mysli firmy jako Unitree co mají např robodogs co už se dost kupují a používají v praxi, co jsou actually affordable narozdíl od např Boston dynamics. [Unitree Robotics - Wikipedia](https://en.wikipedia.org/wiki/Unitree_Robotics) Ale či je v tom lepší USA nebo Čína se dost debatuje. Záleží taky jak se "lepší" definuje. 😄 Oba se sebou závodí o sto šest. Evropa toho má taky trošku. Ten robodog se např používá na nošení zavazadel nebo ve válce Ten o dost dražší Boston dynamics jsem taky viděl použít hodně, např jako pes na ovce A tohle je teď asi nejlepší mapa humanoidů [Imgur: The magic of the Internet](https://imgur.com/VnVR0OD) V Číně jsem např viděl ještě tohle [China to host world’s first foot race between humans and humanoid robots | The Independent](https://www.independent.co.uk/tech/china-robots-humanoid-humans-race-b2686744.html) Ale z těch amerických se Figure používá v BMW factory, a Digit se např používá ve skladu Možná je víc accurate říct že zhruba Čína vede v cheap solutions pro větší mass manufacturing, mezitím co Amerika vede v expensive solutions 😄 to mi přijde že platí napříč hodně průmyslama :D I když si nejsem jistý tohle je v kontextu tý války, čísnkej robodog jako podpora ukrainy [Reddit - The heart of the internet](https://www.reddit.com/r/CombatFootage/comments/1eku3im/footage_of_ukrainian_28th_brigade_using_a_unitree/) [Reddit - The heart of the internet](https://www.reddit.com/r/singularity/comments/1frevgn/ukraine_is_using_vampire_drones_to_drop_robot/) ukrainskej robodog s automatickým samopalem versus ruskej osel, who will win? tím jak vedou v cheap marketu tak je to nejvíc vidět no 😄 záleží jak se zadefinuje "nejlepší", já do toho fakturuju i cost, hlavně u takhle expensive hraček pokud člověk chce jen total capabilities, tak asi vede USA, i když si nejsem jistý [https://www.youtube.com/watch?v=X2UxtKLZnNo](https://www.youtube.com/watch?v=X2UxtKLZnNo) co se týče demos, na druhou stranu jsem jestě neviděl demo od Američanů s dost capabilities co ukázali tady u jejich víc expensive robota i když ostatní mají dema a praktický využití se strengths zas jinde to by chtělo aby někdo udělal nějakou pořádnou komparativní analýzu všech těhle nových robotics companies a! [https://www.youtube.com/watch?v=t0yg-zeOmag](https://www.youtube.com/watch?v=t0yg-zeOmag) hmm tak ten novej Unitree B2W vypadá že oproti Boston dynamics Spotovi má lepší payload, speed, endurance, manuvering? ale Spot má zase lepší customizability a víc accessories, což je zase mnohem lepší pro custom tailored problems v industries Takže možná vedou i v high cost marketu ve nějakých doménách, ale chtěl bych analýzu od lepšího zdroje ale co se týče dema čistý rychlosti tak to jsem taky viděl zatím nejlepší od činy https://fxtwitter.com/adcock_brett/status/1881024651377230013 je to bordel když se furt navzájem všechny ty companies předháněj v hodně různých doménách 😄 I wish robotics had some realtime updated list of different benchmarks testing different capabilities and different properties of different robots from different companies, to see who's actually on top in what domains, like it's being done with software AIs, but that's a problem, because it's way harder to test robotics It still often fascinates me how these silicon entities often completely fuck up to us ultra simple things in various contexts, while they also completely shine, sometimes beyond humans, in different contexts Operating on architectures that are both different and similar to us https://fxtwitter.com/DimitrisPapail/status/1889755872642970039?t=Vr-9NWmA1IG51D_LJzhA2Q&s=19 https://fxtwitter.com/teortaxesTex/status/1887991191037227176 Je zajímavý ten flipping co se týče elementární aritmetiky, ale zároveň určitou podmnožinu matiky to (still in brittle but already very useful way) začíná víc a víc dávat Deep learning origami ne a ne aproximovat dostatečně generalizující vector programy na elementární aritmetiku Zajímalo by mě či se elementární aritmetika eventuálně podaří nějak zkrotit jako tady přes self-improvement nebo jiný hacks, nebo se eventuálně přejde na víc fundamental neurosymboliku, kde se na elementární aritmetiku pokusí ten model ve většině případech použít symbolický engine jako součást ty architektury Zatím se v praxi v podstatě neurosymboliku používá, LLM plus calculator/python etc. v tool calling patternu, což často funguje, ale to není dostatečně fundamental IMO Nebo jestli bude ještě jinej paradigm shift nebo paměť? Ty reasoning tokens (CoT nebo latent) jsou kinda almost infinite paměť dle některých pokud má možnost být unbounded https://fxtwitter.com/MatthewBerman/status/1890081527238963484 Ale možná některá z těch architektur co se pokouší implementovat víc explicitní paměť jako tu sem tam linkuju se víc uchytí a víc to vyřeší https://fxtwitter.com/omarsar0/status/1889681118913577345 [[2502.06049] LM2: Large Memory Models](https://arxiv.org/abs/2502.06049) From the good old days of GPT-2 love vector, do people try to get universal jailbreaks by something along the lines of sampling refusal prompts that let you approximate refusal vector direction and then you can subtract that vector to get the universal jailbreak? Sparse autoencoders would be better for feature steering but one doesn't have access to the model weights. https://x.com/janleike/status/1890141865955278916?t=UDcBCqzTptxNDv8Fm1k-kw&s=19 Perplexity deep research links to some interesting jailbreak methods in sources https://www.perplexity.ai/search/sota-methods-to-get-llm-univer-QxgAW4XbRIabumxSmL9geg if you're not speaking to LLMs in raw tokenized latent features and circuits, what are you doing I have a feeling that there is a big chance that OpenAI will crumble in a few years due to the fact that so much of their top talent keeps running away to competitors (Anthropic, Safe Superintelligence, Thinking Machines Lab,...) and others will overtake them. I think the trend is already there. But they definitely have the biggest funding, the best reasoning models, and the biggest brand with the normies, for now. But we'll see if that's enough. The emperor's moat is collapsing. There is so much AI research emerging in thinking in latent space and implementations of better memory. My prediction is that those will be the next two scalable breakthroughs in algorithmic improvement. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [[2502.05171] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) Titans: Learning to Memorize at Test Time [https://youtu.be/UMkCmOTX5Ow](https://youtu.be/UMkCmOTX5Ow) " Did pretaining hit a wall with OpenAI's GPT-4.5? Data jsou taky faktor, ale víc to chápu jako klasický scaling laws, co říkají, že exponential gains in compute lead to linear improvements in performance v klasickým pretraining škálování, co se dostávají na úroveň, kde 10x compute je šílené náročný, a ten linear gain už nepůsobí tak massive relativně k předchozím orders of magnitude za to množství investovanýho compute, když šli z GPT-4o na GPT-4.5. Proto se teď víc migruje na škálování inference time reasoning paradigmatu, co se nad těmato modely budují, kde jsou další exponential-linear scaling laws v inference time dimenzi, co fungují majoritně bez lidských dat. 😄 A tyhle dvě scaling laws se kombinují. Ale pokud nad tímhle novým base modelem postaví reasoning model, tak pokud nevytvoří dostatečný efficiency gains, tak jeden prompt s dlouhým reasoningem bude stát ledviny. <:PepeLaugh:961760597261811803> Ale to už by mohl být nejlepší model na světě. Navíc competition asi začíná OpenAI víc mogovat s lepším efficiency. Podle toho na kterej benchmark koukneš, tak buďto teď OpenAI má nejvíc performant base model (např LiveBench), nebo už je ostatní jako Anthropic nebo Grok nebo DeepSeek předehnali v base modelech. (např SweBench) Ale tbh mám pocit, že je velká šance, že OpenAI do pár let crumblne, kvůli tomu že jim tolik nejhlavnějšího talentu pořád utíká ke competitors (Anthropic, Safe Superintelligence, Thinking Machines Lab, atd...) a předeženou je jiní. Myslím že ten trend tam už je. Rozhodně zatím ale mají největší funding a největší značku u normies. Ale teď jen jestli to bude stačit. And they still have the best reasoning models and deep research system, but for how long? The emperor's moat is collapsing. " Increasing parameters in pretraining is one scaling law. Increasing inference time compute is another scaling law, and you can do that via CoT RL, MCTS, mass sampling etc.. There are different dimensions that have scaling laws. I think there will be many more such scaling laws. Scaling laws jsou popsaný tady: [Neural scaling law - Wikipedia](https://en.wikipedia.org/wiki/Neural_scaling_law) Trénování už teď shiftuje klasickýho posttrainingu s trénovacíma data směrem k fuck around and findout se symbolickýma verifiers (reinforcement learning jako např GRPO [[2501.09686] Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models](https://arxiv.org/abs/2501.09686) [[2501.17161] SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training](https://arxiv.org/abs/2501.17161) ) Zároveň zvniká hodně alternativ transformeru (to co teď většina LLMs používají jako core základ), co jsou promising, tak se uvidí 😄 Liquid foundational models od LiquidAI jsou zajímavý, založený na liquid neural networks, používající differenciální rovnice [Liquid Foundation Models: Our First Series of Generative AI Models](https://www.liquid.ai/liquid-foundation-models) [From Liquid Neural Networks to Liquid Foundation Models](https://www.liquid.ai/research/liquid-neural-networks-research) Co se týče alternativ transformeru, tak ještě existuje Mamba, FFTNet, RWKV, Hyena, xLSTM,... Co se ale týče všech těhle alternativ transformeru, tak apparently je dost problém ty alternativy naškálovat. Transformer je prostě somehow král škálování. AI is brush for painting code 🎨🖌️ I love writing AI that writes AI that writes AI that writes AI definicí AGI je billion, ale teď jen jestli tu definici chceš od korporací, AI researcherů, inženýrů, matematiků, kognitivních vědců, filozofů, futuristů, normies, atd.... podle toho ta definice vypadá, protože většinou reflektuje na čem těm lidem nejvíc záleží Existuje bilion definic AGI od různých typů lidí. Ta definice většinou reflektuje to, na čem těm lidem nejvíc záleží. Různý definice dostaneš od korporací, AI researcherů, inženýrů, matematiků, fyziků, kognitivních vědců, filozofů, futuristů, normies, atd... There are a billion definitions of AGI by different types of people. The definition usually reflects what those people care about the most. You will get different definitions from corporations, AI researchers, engineers, mathematicians, physicists, cognitive scientists, philosophers, futurists, normies, etc... " There is so much AI research emerging in thinking in latent space and implementations of better memory. My prediction is that those will be the next two scalable breakthroughs in algorithmic improvement. My Illya works on scaling this. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach [[2502.05171] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) Titans: Learning to Memorize at Test Time [https://youtu.be/UMkCmOTX5Ow](https://youtu.be/UMkCmOTX5Ow) Or maybe they will scale CoT/MCTS RL in even better way. Or maybe they'll figure out how to scale all these alternatives to transformers? Mamba, FFTNet, RWKV, Hyena, xLSTM, Liquid foundational models, oscillatory kuramoto neural networks, diffusion LLMs...? [Reddit - The heart of the internet](https://www.reddit.com/r/singularity/comments/1j1tp72/any_theories_on_what_ilyassi_is_working_on/) " Vibecoding with Cursor and Claude 3.7 Sonnet is revolution! Finally we have technology that is more able to materialize my high frequency of ideas that my brain generates, but it's still not fast enough and the amount of ideas is still growing faster than the amount of things created :D incentivize generalization in the loss function