Thoughts AI technical 4

To jsem dneska zrovna v práci dělal a byl až moc prekvapenej že to jelo na první run bez bugů když byla specifikace dostatečně detailed 😄 pak to chtělo nějakou hlavně vizuální optimalizaci a přidání věcí co se do kontextu nevešlo a co se hodilo rozkouskovat na podtasky Code review od více lidí bugy nenašel Also bylo to s Cursorem a Claude 3.5 Sonnetem napojený na knihovny, web, a celou code basu, aby věděl veškerej kontext. ChatGPT by tohle fakt nedal, kdyby se mu nedal tenhle veškerej kontext, a i s tím zformátováným kontextem by ten základní model samotnej taky byl horší. Zítra to bude jako demo u klienta, veškerý testy to prošlo, tak se uvidí xd Cursor teď dostal agenty [https://youtu.be/6swSIMY6iTU?si=bZ9PI0VBOFGTfp-u](https://youtu.be/6swSIMY6iTU?si=bZ9PI0VBOFGTfp-u) A teď začíná jako populární alternativa ke Cursoru být Windsurf, což chci vyzkoušet, co taky vyšel před chvílí [https://youtu.be/824Fyh146_w?si=AA0CKdP8Sf7TGDEi](https://youtu.be/824Fyh146_w?si=AA0CKdP8Sf7TGDEi) V přirozeným jazyce hodně zkompresovaně bez určitých detailů co jdou odvodit je to hodně často násobně rychlejší, i když to Python (ve kterým dělám nejvíc protože většina AI ekosystému je v Pythonu) je sám celkem blízko k přirozenýmu jazyku :kek: chápu pohled, ale řekl bych, že dokáže >nabušit funkční kód v určitých kontextech, jak zmiňuju 😄 pořád je potřeba code review, podobně jako se dělá code review u human generated kódu nebo někdy používáme i AI-based code review, to taky už dost pomohlo 😄 do jistý míry to vidím jako další parťáky osobně 😄 Adopce trvá, legal issues apod. Nvm, u nás to standard je, ale já dělám v AI company, kde pracujeme s tím nejnovějším v AI no. 😄 Agenti už jsou semi autonomous a autonomita se zvyšuje. Vidím to na spektru. 😄 Dělali jsme věci pro cybersecurity firmu, accounting firmu, nebo call center firmu. Já jsem tam spíš researcher, co zkoumá ty nejnovější věci, co se pak zkouší v produkci. 😄 Jo, třeba ta accounting firma nám teď říká jak je happy že jim to pomáhá s hodně hodinama manuální práce. 😄 Ale dělám tam part time. Teď mám dvě práce. Vzal jsem teď ještě jednu práci, kde pomáhám s literature researchem pro rozhovory s AI researchers. Což je cool, to je pro mě asi nejvíc kompatibilní. 😄 ta nová A chtěl bych jít do AI pro zdravotnictví, kde jsem měl pohovor, ale 3 part time práce spíš asi nedám, když se do toho chci ještě učit Možná začínám až moc věcí poslední dobou, protože zároveň chci čas na učení, co kvůli tomuhle začíná trochu mizet. xd Uvidím autonomita - je to spektrum, je to semiautonoumous na multistep věci můžeš prozkoumat různý agent benchmarky agent systémů, např https://fxtwitter.com/METR_Evals/status/1860061711849652378 pokud chceš víc detailů záleží jak definuješ ten threshold autonomity/neautonomity, to můžeš definovat jak chceš ještě to není autonomní na úrovni člověka IMO, jestli ten threshold definuješ takhle 😄 největší problém jsu např loops, který se teď snaží hodně lidí řešit různě argumentoval bych že třeba nějaký web dev věci pod tuhle definici už spadaj 😄 nebo Meta má papery na testery záleží jakou doménu automatizješ aha, už chápu jakou statistiku chceš, chceš vidět menší employment rate nějaké podmnožiny software engineering profese jako důsledek AI takhle to definuješ? jiný industries mají celkem shock, ale nezkoumal jsem konkrétně tuhle statistiku takhle detailně a tipuju, že bude zároveň peroblém odlišit situace, kde nějaká forma automatizace spíš vede k přechodu zaměstnanců na jinou část práce, a jiných faktorů ovlvňující fluktaci v job marketu jakože první statistika co mě napadá je [Over 25% of Google's code is written by AI, Sundar Pichai says | Fortune](https://fortune.com/2024/10/30/googles-code-ai-sundar-pichai/) ale nevím, Googlu moc nevěřím, a jejich Gemini je dost pozadu a nejvíc sleduju AI for writing AI, tam je to asi nejvíc developed, některý laby reportujou že to pomáhá s netriviálními results no jo no, a třeba u Googlu by byla spíš cooler statistika kolik bloatu a neefektivity v Googlím kódu to dokáže identifikovat a ořezat 😄 musím vyzkoušit tyhle agentní frameworky a OpenHands, nebo Cursor agents a další agents, pak můžu dát report jak tam tu míru actual autonomity vidím 😄 ještě před pár dny vyšel nějakej nějakej Neos, ale ten je tuším na whitelist absolutně nestíhám tyhle nový agentní věci zkoušet, teď toho začíná být až moc 😄 náš workflow (a u dost lidí často ze Sillicon Valley co sleduju co taky skáčou po nových věcech kde je nelimituje byrokracie apod.) je teď dost často tohle 😄 pro mě už je to dostatčný na pocit že mám parťáka Clauda upgradovanýho Cursorem, nebo jiný když zkouším jiný věci 😄 lol, jo no, to i s co největším vysvětlením by pravdepodobně mělo celkem problém, pokud je moc exotickej 😄 u hodně exotických věcí vždycky navrhuju co nejvíc specialized step by step solutions, ale pak už to začíná být málo flexibilní a nevyužije se base knowledge modelů tolik no čím míň kontextu jseš schopnej dát, tím horší to bude, takže pokud ten jazyk má i minimum examples apod. s out of distribution věcmama pořád mají šílenej problém tohle dle mě nejlíp vysvětluje tenhle AI reseracher z Googlu [https://youtu.be/JTU8Ha4Jyfc?si=2X72N3RAYYMoN-dN](https://youtu.be/JTU8Ha4Jyfc?si=2X72N3RAYYMoN-dN) ale i tak je zajímavý jak to je stejně někdy schopný použít svojí schopnost slabší generalizace celkem překvapivě 😄 čím míň nejistoty a novosti tam je, tím líp se to automatizuje ale to asi platí i u lidí, že jsou víc schopni se to naučit dělat ajímalo by mě jak by to třeba zlepšil nějakej agentní systém co to hodně rozkousovává ale záleží co to je za jazyk na jaký tasky apod. 😄 nebo z toho můžete udělat nějakej benchmark existuje pár benchmarků kde se LLMs celkem solidně rozbíjej, co jsou mega zajímavý, co ukazujou, kde jsou ještě dost short relativně k lidem, jako je LiveBench nebo ARC challenge 😄 ty všechny sbírám do kolekce zároveň to nádherně ukazuje jak to některý tasky dává bez problémů a někdy líp jak někteří, ale jiný jsou horší než batolata myslím že budoucnost (včetně na tyhle exotický jazyky) je neurosymbolika, kam si myslím, že AI obor pomalu spjeje, když se člověk kouká, jakým směrem se ty novější architektury pomalu vyvíjí 😄 ale ten progress deep learningu na dost benchmarcích (a na dost praktických tasks) za poslední dva roky je dost mindboggling IMO 😄 jinak na software engineering benchmarking je asi nejrelevantnější SWE-bench [SWE-bench](https://www.swebench.com/) neagentní frameworky tam v podstatě neexistujou nejvíc vyhrává claude 3.5 sonnet embedded v nějakým agentním frameworku takže pokud chcete tak můžete zkusit třeba ten OpenHands, což jsem tu už několikrát linknul, což je něco co chci zkusit co nejdřív takhle vypadá jeho GUI ale Cursor a Windsurf ted bude v praxi pouzivanej asi nejvic v generalistickým kontextu (všechno s Claudem) (ale v praxi pro klienty pro víc specialized usecases víc specializovaný systémy často o dost vyhrávaj, buď wrappery nad mainsream modelama s nadstavbama, buď sami nebo v nějaký (agentic) compozici, nebo třeba i ty finetuned / from scratch specializovaný modely) "Just posting this very nice new graph from Epoch AI in case anyone doubts that there has been major progress in AI since GPT-4 launched. This PhD-level science benchmark goes from slightly above guessing to expert human level. Nor does this trend seem to be leveling off. Similar strong trend in math." https://x.com/AnthonyNAguirre/status/1861893538532991096 And this is just the base models themselves: no RAG, no agentic frameworks, no extra tools,... i agree that i also sense that data quality is probably the biggest moat, but i also think that the architectures are changing, even if not as much Future is multiagentic [[2410.20424] AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions](https://arxiv.org/abs/2410.20424) You often memorize before you can generalize https://x.com/PhysInHistory/status/1862437347750814092?t=SIQEQPihMLRrvxhxcdOLQw&s=19 New test time compute AI paradigm explosion I wonder if Meta is working on one too https://x.com/ai_for_success/status/1862126220588052536?t=V6XLMZg3CXNqYlYAkawyyg&s=19 ML x computer security. Dneska celej den researchuju týpka na týto intersekci, a hodně si stěžuje, jak na týhle intersekci je strašně málo lidí a hodně chybí talent, a jak velkej problém to je, protože security ML supply chainů je mnohem horší než security klasickýho softwaru, kde to mělo víc času se už vyvinout. 😄 Nebo tohle je taky zajímavý: attacky na maximizing energy consumption and latence 10-200x, což může fungovat jako DoS 😄 [[2006.03463] Sponge Examples: Energy-Latency Attacks on Neural Networks](https://arxiv.org/abs/2006.03463) kromě rozbíjecích inputů, ddosů, backdoory v datech, achitektuře apod., je ještě největší threat pro closed sourced providery distilace modelů, takovej výcuc parametrů přes systematický inputy (a jiný metody), což v podstatě jejich modely kind of přetvořuje do částečně open source 😄 A Survey on Stealing Machine Learning Models and Defences [[2206.08451] I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences](https://arxiv.org/abs/2206.08451) ale tohle přesně chci aby se dělo, protože teď spíš začínám být open source a democratize AI evangelista xd attackery můžou platit menší playeři co jim chtějí rozbít competetive advantage, nebo to můžou prodávat na open marketu, nebo to leaknout všem zadarmo, a defendery platí korporace nebo jiný companies Omg nice 😄 [[2403.06634] Stealing Part of a Production Language Model](https://arxiv.org/abs/2403.06634) >Can AI make PowerPoint presentations yet? Yes, but depends if you mean create slides, create script, present virtually, or present physically in what ways, if it includes some more stuff, like accurate Q and A etc. Different levels of difficulty, but possible to some degree depending on what type of an AI system you use. On the first level of difficulty, my boss regularly uses AI to create powerpoint slides, and then edits it depending on how he wants it, and says that he likes how much time it saves overall with this top down approach I bet they just fed it into free version of ChatGPT like every other student without any extra steps There's so little education on what all kinds of general and specialized AI systems exist for various domains and how to use them properly and how to make them 😭 In one of my jobs i create complex specialized AI systems tailored to custom domains to make it much more accurate, though if you can't express the problem symbolically and have to use some form of statistics, and there's combinatorial explosion of possible inputs and responses, then it will probably never be 100% correct, and in many domains how humans reduce that combinatorial explosion of possibilities is still much more efficient, usually by something that's in AI called continual learning and something similar to test time finetuning and test time compute and maybe neurosymbolic reasoning, and whatever else the brain uses that isnt used in various forms of AI systems yet as much or that we dont know yet. i would argue that all sorts of tools and technologies are used in both unskilled and skillful, bad and right, ways in terms of ML subfield of AI, and specifically deep learning, but you also have for example alphafold helping protein folding [https://www.youtube.com/watch?v=cx7l9ZGFZkw](https://www.youtube.com/watch?v=cx7l9ZGFZkw) If the predictions from the ML model are vastly less accurate and less efficient than the other methods, that conclusion makes sense. These papers at least empirically proved that other methods are better there, if they engineered these ML methods properly in the first place, and SoTA is still moving there too. I see it useful in domains where analytical and other computational methods are worse, like that protein folding example I linked, or some subfields of quantum chemistry apparently recently, or many subfields in healthcare (like cancer classification) or some forms of robotics. I could go on. :D And of course the recent explosion of a lot of deep learning based natural language processing, where for tons of tasks we do not have better different types of AI systems or other methods yet in benchmarks, where so many people try to apply it to everything that can be expressed in natural language, and often forgetting that you can finetune/make your own models from scratch that might be better, or make a more complex system on top of models that that might be better, like multiagentic RAG and tool use for example, or that other nonDL nonML nonstatistical methods exist too and may still be better for some types of tasks. if it would not be accurate outside of its training distribution, then that sounds sounds like the classic problem of overfitting, which is one of the main problems that ML faces, and you can try if many methods to prevent overfitting work [Overfitting - Wikipedia](https://en.wikipedia.org/wiki/Overfitting) [Bias–variance tradeoff - Wikipedia](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff) We need more technical education about AI it's still surprising that the big models in certain domains like for NLP do not overfit a ton if engineered properly especially [Double descent - Wikipedia](https://en.wikipedia.org/wiki/Double_descent) "In statistics and machine learning, double descent is the phenomenon where a model with a small number of parameters and a model with an extremely large number of parameters have a small test error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning." broken neural scaling laws is my favorite model of it [[2210.14891] Broken Neural Scaling Laws](https://arxiv.org/abs/2210.14891) And this is my favorite intro to why this happens in the first place "Deep learning generalizes because the parameter-function map is biased towards simple functions" "Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit." [[1805.08522] Deep learning generalizes because the parameter-function map is biased towards simple functions](https://arxiv.org/abs/1805.08522) But this topic is so complex and such a rabbithole, and we still don't understand so much about it Generalization in general is probably my favorite nerd snipe, because my favorite definition of intelligence in it's core includes the ability to generalize [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) My favorite attempt to formalize intelligence of an arbitrary system mathematically using algorithmic information theory :D Describing intelligence as the ability to adapt to new situations by mining insights from previous experiences thanks to generalizing, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience [Imgur: The magic of the Internet](https://imgur.com/a/aiATXds) This would make a good tattoo Here's the author discussing it in the current context of current AI systems [https://www.youtube.com/watch?v=JTU8Ha4Jyfc](https://www.youtube.com/watch?v=JTU8Ha4Jyfc) When you build complex specialized AI systems tailored to custom domains to make it much more accurate, when you can't express the problem symbolically and have to use some form of statistics, and there's combinatorial explosion of possible inputs and responses, then it will probably never be 100% correct, and in many domains how humans reduce that combinatorial explosion of possibilities is still much more efficient, usually by something that's in AI called continual learning and something similar to test time finetuning and test time compute and maybe neurosymbolic reasoning, and whatever else the brain uses that isnt used in various forms of AI systems yet as much or that we dont know yet. With the recent explosion of a lot of deep learning based natural language processing, where for tons of tasks we do not have better different types of AI systems or other methods in benchmarks yet, so many people try to apply it to everything that can be expressed in natural language, and often forget that you can finetune/make your own models from scratch that might be better, or make a more complex system on top of models that that might be better, like multiagentic RAG and tool use for example, or that other nonDL nonML nonstatistical methods exist too and may still be better for some types of tasks. If predictions from the ML model are vastly less accurate and less efficient than the other methods, then that experiment at least empirically proves that other methods are better there, if these ML methods were engineered properly in the first place, and state of the art is still moving there too. I see ML useful in domains where analytical and other computational methods are worse, like protein folding, or some subfields of quantum chemistry, or many subfields in healthcare (like cancer classification), or some forms of robotics. Or of course the recent giant explosion of deep learning in natural language processing that made most previous methods obsolete for a lot of types of tasks. I could go on. Should we call deep learning just nonlinear dynamic analysis? If you mean how to call it, then personally from a cognitive science perspective, I like to call them as part of AI, as they emerged from connectionism paradigm as one of the initial attempts to approximate the brain And I still see RNNs probably the most from this class of approaches when it comes to modelling the brain [Recognizing Recurrent Neural Networks](https://www.cbs.mpg.de/210929/recognizing) And bayesian inference [Predictive coding - Wikipedia](https://en.wikipedia.org/wiki/Predictive_coding) Many people try to merge these two paradigms more generally, in both computational neuroscience and AI fields "nonlinear dynamic analysis" seems like a more general term that also includes methods that do not originate from computational neuroscience :D all sorts of numerical methods, solving nonlinear differential equations of motion,... Or actually, it feels like a different field with a lot of overlap with methods in the AI field. As various methods in the AI field are linear. In ML subset of the AI field you have linear regression for example, which doesn't go into nonlinear analysis. But linear regression originates from astronomy, and many deep learning architectures are technically stacked linear regressions with added nonlinear functions etc. Putting all these methods into boxes is so confusing because of all the vagueness if it's not defined strictly, or the overlaps between the boxes, fuzzy boundaries, different people meaning different things by these terms depending on their background, different fields reinventing the same algorithm for different purposes, etc. X.X For AI i just usually go with the most used terminology in the field to confuse people the least, that sticked the longest, with AI being a gigantic field with mostly brain inspired methods, with its major branches being symbolic methods and machine learning (which includes deep learning with neural networks), and their recent more frequent merging into neurosymbolic AI. There's much more AI branches than these two too, like evolutionary computation, swarm intelligence, etc., which also merge with eachother a lot, neural cellurar automata is cool! Linear regression originated in astronomy. Deep learning is stacked linear regressions with some nonlinear activation functions baked in. AI is applied astronomy.