Book 198 - Burny

Item responce theory better than standard IQ tests https://twitter.com/algekalipso/status/1790198799690055684?t=Vyb_R_vGlOetZLZ4vGPKiA&s=19 [Topological Segmentation of the EM Field: A New Approach to the Boundary Problem of Consciousness - YouTube](https://youtu.be/nEuVGoKRfoQ?si=Q58IBiTyv4ypp1zQ) [[2405.06624] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems](https://arxiv.org/abs/2405.06624?fbclid=IwZXh0bgNhZW0CMTEAAR0lpW1fSK9cWNVIBIP9H2uEUph7v4N8xd54QsAVSzlTnQiK5ymVaXsZ5_A_aem_AcH7mV-FXA0UmcT78A8mKfLSfGs1w7QhzOTKYbahhoeaqB9l3LvJnx1j-Z6hBJONc4k_AEefDYr7TpsJpb3JRIKW) https://www.pnas.org/doi/10.1073/pnas.2312992121 [Can LLMs Reason & Plan? (Talk @Google_DeepMind LLM Reasoning Seminar) - YouTube](https://youtu.be/hGXhFa3gzBs?si=HOLn87s8I6_99hWc) one of the most common lenses from people working with the most of current mainstream AI models is that the models are literally their approximated data with weak generalization dataset is all you need [The “it” in AI models is the dataset. – Non_Interactive – Software & ML](https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/) [Scientists Imaged and Mapped a Tiny Piece of Human Brain. Here's What They Found | Smithsonian](https://www.smithsonianmag.com/smart-news/scientists-imaged-and-mapped-a-tiny-piece-of-human-brain-heres-what-they-found-180984340/) https://www.science.org/doi/10.1126/science.adk4858 [A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution - YouTube](https://www.youtube.com/watch?v=BjIEPJlbU-Y) "Anyone following AI research news has been bombarded recently by posts on a recent MIT paper on "KAN" (Kolmogorov–Arnold Networks). The work is quite promising, with the paper envisioning to even replace standard neural networks. But how novel are KANs exactly? And could our brain's neurons form KANs? You'd be surprised. Briefly, KANs are networks where nodes are just linear sums and the edges connecting them are nonlinear functions. KANs seem entirely upside down from a machine learning perspective, because everyone knows that *neurons* are the nonlinearities in neural networks. Nobody had thought of nonlinear *connections* before, right? Wrong! Any neuroscientist will tell you that signal transmission through synapses & dendrites is full of nonlinear mechanisms. Short-term plasticity (STP), shunting inhibition, saturating responses, voltage-gated receptors, to name a few. In fact, contrary to KANs, nonlinearities on the connections of biological networks are even stateful & time-dependent, which enables temporal signal processing and sequence modelling. Ok, but nobody had previously proposed the use of such nonlinear connections in artificial models, right? Wrong! Neuromorphic engineers have been emphasizing the need to incorporate such mechanisms into AI, for years now. Ok, but nobody actually used these biological nonlinear connections as practical machine learning models, right? Wrong again! For example, with @hector_grhv at ICML 2022, we showed that neurons with STP that is meta-learned (akin to the training of the parametrized edge nonlinearities of KANs) can outperform fully-connected feedforward nets (as KANs also do) and we even surpassed recurrent networks such as LSTMs. With @abuseb & Evangelos Eleftheriou at @IBM we have also published several other algorithmic uses for nonlinearities on neuronal connections in the form of STP. In other work, we have even shown physical realizations of such nonlinearities, in nanodevices for energy efficient AI hardware accelerators with in-memory computing. Eg with @ghazisarwat , Abu & others we used memristors as synapses with short-term plasticity (Nature Nanotechnology 2022). With Ghazi & Harish Bhaskaran at @UniofOxford we implemented physical nonlinear dendrites with shunting inhibition (Nat Commun 2022). With Chris Weilenmann, A. Ziogas, A. Emboras, & Mathieu Luisier from ETH Zürich we recently realized in memristive hardware even the meta-learning that trains the synaptic nonlinearities that connect neurons. Of course the KAN paper includes several truly novel ideas & methods, which make KANs work so well in practice. However, the key to the Kolmogorov-Arnold structure is the placement of nonlinear functions onto the network's edges, and this idea is not entirely new. Its potential practical superiority over MLPs is not an entirely new result either. This neuromorphic analogy is rather exciting. cc: KAN paper's leading authors @ZimingLiu11 & @tegmark , with due respect for the great work." https://twitter.com/timos_m/status/1790264022857642135 text to video google [Veo - Google DeepMind](https://deepmind.google/technologies/veo/) https://twitter.com/drjimfan/status/1790089671365767313 "I know your timeline is flooded now with word salads of "insane, HER, 10 features you missed, we're so back". Sit down. Chill. <gasp> Take a deep breath like Mark does in the demo </gasp>. Let's think step by step: Technique-wise, OpenAI has figured out a way to map audio to audio directly as first-class modality, and stream videos to a transformer in real-time. These require some new research on tokenization and architecture, but overall it's a data and system optimization problem (as most things are). High-quality data can come from at least 2 sources: 1) Naturally occurring dialogues on YouTube, podcasts, TV series, movies, etc. Whisper can be trained to identify speaker turns in a dialogue or separate overlapping speeches for automated annotation. 2) Synthetic data. Run the slow 3-stage pipeline using the most powerful models: speech1->text1 (ASR), text1->text2 (LLM), text2->speech2 (TTS). The middle LLM can decide when to stop and also simulate how to resume from interruption. It could output additional "thought traces" that are not verbalized to help generate better reply. Then GPT-4o distills directly from speech1->speech2, with optional auxiliary loss functions based on the 3-stage data. After distillation, these behaviors are now baked into the model without emitting intermediate texts. On the system side: the latency would not meet real-time threshold if every video frame is decompressed into an RGB image. OpenAI has likely developed their own neural-first, streaming video codec to transmit the motion deltas as tokens. The communication protocol and NN inference must be co-optimized. For example, there could be a small and energy-efficient NN running on the edge device that decides to transmit more tokens if the video is interesting, and fewer otherwise. I didn't expect GPT-4o to be closer to GPT-5, the rumored "Arrakis" model that takes multimodal in and out. In fact, it's likely an early checkpoint of GPT-5 that hasn't finished training yet. The branding betrays a certain insecurity. Ahead of Google I/O, OpenAI would rather beat our mental projection of GPT-4.5 than disappoint by missing the sky-high expectation for GPT-5. A smart move to buy more time. Notably, the assistant is much more lively and even a bit flirty. GPT-4o is trying (perhaps a bit too hard) to sound like HER. OpenAI is eating Character AI's lunch, with almost 100% overlap in form factor and huge distribution channels. It's a pivot towards more emotional AI with strong personality, which OpenAI seemed to actively suppress in the past. Whoever wins Apple first wins big time. I see 3 levels of integration with iOS: 1) Ditch Siri. OpenAI distills a smaller-tier, purely on-device GPT-4o for iOS, with optional paid upgrade to use the cloud. 2) Native features to stream the camera or screen into the model. Chip-level support for neural audio/video codec. 3) Integrate with iOS system-level action API and smart home APIs. No one uses Siri Shortcuts, but it's time to resurrect. This could become the AI agent product with a billion users from the get-go. The FSD for smartphones with a Tesla-scale data flywheel." Často každej den se mi stane: potkám někoho novýho řeknu že mě zajímá AI "Jooo já jednou zkoušel ChatGPT to je zajímavý" Tak probereme usecases pro který to využívá, dám tipy jak to využívat efektivněji, nebo jaký všechny ostatní systémy na ty různý usecases existujou co jdou jednoduse používat co v podstatě skoro nikdo nezná, nebo začnu přednášet o tom že právě existuje tuna dalších jazykových modelů co moc lidí nezná, nebo že jazykový modely na bázi deep learningu nejsou jediný AI systémy co teď lidi hlavně znají. Nebo ty různý mainstream tools na generování obrázků, nebo na rozpoznávání obrázků. Ale zase je zajímavý jak když ten druhej má zase prohloubenej jinej obor, jaký insights má zas on 😄 Osobně to vidím tak že dosavadní mainstream systémy tím jak jsou hlavně právě curve fitters datasetu syntetizující/kombinující nauceny zkompresovany programy (co jsou ale sem tam fuzzy), což ukazuje relativně slabou, ale existující generalizaci. Ale sem tam tu postuju různý pokusy o generalizaci přes víc systematicky generalizující trénovací data a trénování, nebo alternativní architektury ukazující lepší potenciální generalizaci "it's impressive how boring and incremental Google made these AI advances look in their keynote. They employ 10's of thousands of normies and they don't want to cause a stampede in the ranks. In contrast, OpenAI built a horny robot designed to lure in only the loneliest, 19-hours-a-day in front of the screen, 1000x uber nerds to help single-mindedly race towards AGI, which is the only goal that matters" and people seemed to exhaust their dopamine on OpenAI yesterday already while OpenAI's Sora shadowed Google's Gemini instead the other day OpenAI is god at this attention grabbing lmao I have weird relationship with Gary Marcus I don't like laughing at anyone But if he's constantly criticizing something that I care about in ways disconnected from practice without him even using the tools in the first place It's hard But you're right I don't like joining the sneering bandwagon I also think neurosymbolic methods are the future But everyday saying LLMs/deep learning is terrible over and over again and ignoring all the positive aspects just triggers me And Twitter maximizes triggering a lot lol On one side I wanna block him, on another side I don't want to echo chamber myself and I actually care what he has to say about neurosymbolic methods, if only he wasn't so IMO disconnected about deep learning So many researchers and engineers try to communicate that to him for so long One of kernel of wisdom that I agree with him is that current mainstream architectures have problems generalizing more than weakly https://twitter.com/GaryMarcus/status/1757846809656111201?t=Bg8At9FF8I_IPw2WYXqRLw&s=19 Something like [[2006.08381] DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning](https://arxiv.org/abs/2006.08381) might be the solution for stronger generalization Chollet argues for this https://twitter.com/fchollet/status/1763692655408779455 [[1911.01547] On the Measure of Intelligence](https://arxiv.org/abs/1911.01547) " Existuji polynomiální modely pro regresi a klasifikaci jako jedna z možných alternativ k čistým neuronkam! https://towardsdatascience.com/polynomial-regression-an-alternative-for-neural-networks-c4bd30fa6cf6 https://webcache.googleusercontent.com/search?q=cache:https://towardsdatascience.com/polynomial-regression-an-alternative-for-neural-networks-c4bd30fa6cf6&sca_esv=16520909ba533f16&prmd=ivn&strip=1&vwsrc=0 Kde u čistých neuronek fittujes data přes nastackovany affine transformace (lineární kombinace inputs a weights) v layerech s nelineárními activacnima funkcema jako relu mezi layerama. Jazykový modely v Transformer architektuře tam mají ještě další zábavu jako scaled dot product attention, mixture of experts redirecting apod. Všude můžeš technický nacpat různý jiný lineární a nelineární funkce, ale jak to bude fungovat je otázka další 😄 Polynomiální regrese se někdy používá, existují polynomiální kernely v support vector machines. Výhoda bude čím víc je dataset co aproximujes podobnější/přátelštější s polynomama. Hmm dobrej thread https://www.reddit.com/r/MachineLearning/comments/myfirg/d_why_are_neural_networks_better_than_polynomial/ Hezký odpovědi "I'd say that the primary reason is how polynomial models scale. Let's say you have 1000 features. To build the full polynomial model of the 3rd degree you need (10003 +1) parameters (the 1 is the bias). This is already in the GPT domain in terms of number of parameters. Adding higher order interactions makes the problem exponentially worse. For image data you are dealing with even more features (without preprocessing) so the problem worsens further. In this sense neural networks provide much more flexibility at a much lower cost." "Someone else pointed out that NNs can actually evaluate arbitrary polynomials if the connections and activations are right, and then can also be trained via gradient descent, etc., which is to say that the NN framework is pretty general." "You can use any sum of linearly independent functions to approximate another function (I think). It works with sines (Fourier), polynomials (maclaurin or taylor if you translate it) and even with pseudorandom noise signals (e.g. reservoir computing). I don`t understand NNs too well, but I think what you are doing (if you use a RELU because it is easier to think) is a piecewise approximation. Then the optimization step (like the gradient descent) works because you are adjusting the weights of every slope.. in a very high dimensional way. I think the key here is whether the representation you find has interesting properties. Sine waves are nice because they have translation invariance properties and I think this is what conv nets are using. Translate a dog 2 pixels to the right, still a dog, so this is why I think it works. The RELU thing, even though I dislike it, I think also works because each step forces many different units to the right place. This possibly has a proper mathematical way of being expressed, but I am not aware of it. Now if you take polynomials, they are horrible with translations. You can easily see that the high powers will dominate in translated signals. Imagine something like k*(x+dx)^10 and even if dx was initially small dx^10 will be really large and the constant k will change a lot. Easiest way here is to put this in matlab and see the coefficients change a lot for even slightly translated functions." "There are a few papers that consider NNs with quadratic activation functions, which can learn polynomials with even exponents: http://proceedings.mlr.press/v80/du18a/du18a.pdf [[2006.15459] Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions](https://arxiv.org/abs/2006.15459) " " A simple answer, Neural Networks can model Polynomials and can be equivalent to polynomial approximation. This means Neural Nets are at worst polynomial approximation. Why do they work better? That’s a great question. I have no confirmed answer, but I speculate that it generalizes better and allows us to create more complex models like CNNs and LSTMs. Here’s a paper arguing that Polynomial Regression could be a better alternative to NNs in many cases [[1806.06850] Polynomial Regression As an Alternative to Neural Nets](https://arxiv.org/abs/1806.06850) ! Do you approximate your world model by affine transformations with nonlinear activation functions, polynomials, sines, pseudorandom noise signals (reservoir computing), or some superexotic magic that is approximating arbitrary functions and generalizing that allows you to venture into out of distribution beyond classical language? Můžeš neuronky udělat i hyperbolický [[1805.09112] Hyperbolic Neural Networks](https://arxiv.org/abs/1805.09112) " Transcript: Whisper od OpenAI se používá nejvíc https://platform.openai.com/docs/guides/speech-to-text, nebo jsou alternativy jako Deepgram https://www.perplexity.ai/search/alternatives-to-OpenAI-qZpEoJmPQq6L6OVEF.WwoA#0 nebo nový Universal [AssemblyAI | AI models to transcribe and understand speech](https://www.assemblyai.com/) nebo Gemini od Googlu Nebo půjde použít od OpenAI nový multimodal GPT-4o model ://www.youtube.com/watch?v=ZJbu3NEPJN0 [Serotonin’s Hidden Power: How Psychedelics Are Opening New Doors in Mental Health](https://scitechdaily.com/serotonins-hidden-power-how-psychedelics-are-opening-new-doors-in-mental-health/) reverse engineeruje mechanism of action 😄 5-HT1A receptor my beloved [Structural pharmacology and therapeutic potential of 5-methoxytryptamines | Nature](https://www.nature.com/articles/s41586-024-07403-2) Tyhle analogy už jsem pár krát v minulosti viděl: " We show that a 5-HT1A-selective 5-MeO-DMT analogue is devoid of hallucinogenic-like effects while retaining anxiolytic-like and antidepressant-like activity in socially defeated animals." Např existuje podobný nehalucinogenni analog pro ibogu, což je zase antiaddictive [A non-hallucinogenic psychedelic analogue with therapeutic potential | Nature](https://www.nature.com/articles/s41586-020-3008-z) Ale nehalucinogenni 5-MeO-DMT analog zní úžasně 5-MeO-DMT je za mě nejlepší molekula na planetě 😄 Má nejlepší ❤️🩷🧡💛💚🩵💙💜❤️‍🔥❣💕💞💓💗💖💘💝 vibe 😄 Btw, 5-MeO-DMT je v Česku legální (je to to stejná molekula co je v žábách Bufo alvarius z Mexika, ale syntezuje se to i synteticky) [5-MeO-DMT - PsychonautWiki](https://psychonautwiki.org/wiki/5-MeO-DMT) Je to asi nejsilnější psychedelikum na planetě co mě několikrát dlouhodobě upliftlo, před tím než mě nějakej environmentální faktor zhodil [GitHub - elder-plinius/L1B3RT45: JAILBREAK PROMPTS FOR ALL MAJOR AI MODELS](https://github.com/elder-plinius/L1B3RT45) jailbreaks Quite interesting that GPT-4o apparently is much better at paying attention to things in the context. Very good news for ICL, agents, and other tasks that improve with better long context use! [needle-in-a-needlestack](https://nian.llmonpy.ai/) Google I/O https://twitter.com/jerryjliu0/status/1790434539124236661 https://twitter.com/DrJimFan/status/1790441325386760230 https://www.sciencedirect.com/science/article/pii/S1934590923004393?via%3Dihub [First functional human brain tissue produced through 3D printing](https://interestingengineering.com/science/first-functional-human-brain-tissue-produced-through-3d-printing) [Brain-Reading Device Deciphers Internal Thoughts With Surprising Precision](https://gizmodo.com/brain-machine-interface-translate-speech-telepathy-bmi-1851476460?utm_medium=sharefromsite&utm_source=gizmodo_twitter) [Representation of internal speech by single neurons in human supramarginal gyrus | Nature Human Behaviour](https://www.nature.com/articles/s41562-024-01867-y) [[2405.06409] Visualizing Neural Network Imagination](https://arxiv.org/abs/2405.06409) https://twitter.com/burny_tech/status/1790531470034882991 https://www.lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable AI needs abductive reasoning across subjects not in its training set. [New AI Tools Predict How Life’s Building Blocks Assemble | Quanta Magazine](https://www.quantamagazine.org/new-ai-tools-predict-how-lifes-building-blocks-assemble-20240508/?fbclid=IwZXh0bgNhZW0CMTEAAR2VTr0oA5NeYYFLCnN_GU-5zuVLq1zzfkKw71mbpyLfmDhN-6WS0K8XiLU_aem_AdcHmgbmaAt0et4-TBpi6f_DYjjW0OLT0PJENTOZcpg_yuLEezkQkQwx6SwWc36oTh0F3xFbwLj2S8QwXg0EhO6w) https://www.reddit.com/r/MachineLearning/s/EEvYGBXwH4 [Entropy | Free Full-Text | Evolution of Brains and Computers: The Roads Not Taken](https://www.mdpi.com/1099-4300/24/5/665) [Dogged Dark Matter Hunters Find New Hiding Places to Check | Quanta Magazine](https://www.quantamagazine.org/dogged-dark-matter-hunters-find-new-hiding-places-to-check-20240507/?utm_campaign=later-linkinbio-quantamag&utm_content=later-42859901&utm_medium=social&utm_source=linkin.bio&fbclid=IwZXh0bgNhZW0CMTEAAR0ATOzv-CgNtqcuIXVxsCBPmUHsWDWJcbVTn4Z2myj0DSsXgxqud9__K5o_aem_AdeOSOAnf3FXVeRPsQ-EsDWsXFnj4UZrxVyx6g0qGW27HEVMu_FGNQbr6oalW8gHFeDwrr1YxccrQ0YDtBwUBOkN) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9949652/#:~:text=Neural%20mass%20models%20are%20used,measured%20using%20electro%2D%20and%20magnetoencephalography. [The Cosmos is Divided Into Three Planes | Wolfgang Smith - YouTube](https://www.youtube.com/watch?v=IQGQkn80oe8) [YouTube](https://youtube.com/watch?si=ByZEwH3OXMZ27X2u) Author of AIXI " Lennard-Jones potential The Lennard-Jones potential is a mathematical model used in computational chemistry, molecular physics, and physical chemistry to describe the interaction between a pair of neutral atoms or molecules. It combines a repulsive term ($1/r^{12}$) and an attractive term ($-1/r^6$) to capture the essential features of intermolecular forces. The potential is given by: $ V_{\text{LJ}}(r) = 4\varepsilon \left[\left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6}\right] $ where $r$ is the distance between particles, $\varepsilon$ is the depth of the potential well, and $\sigma$ is the distance at which the potential is zero[1][2][3][4]. Citations: [1] Lennard-Jones potential - Wikipedia [Lennard-Jones potential - Wikipedia](https://en.wikipedia.org/wiki/Lennard-Jones_potential) [2] Lennard-Jones Potential - Chemistry LibreTexts [Lennard-Jones Potential - Chemistry LibreTexts](https://chem.libretexts.org/Bookshelves/Physical_and_Theoretical_Chemistry_Textbook_Maps/Supplemental_Modules_%28Physical_and_Theoretical_Chemistry%29/Physical_Properties_of_Matter/Atomic_and_Molecular_Properties/Intermolecular_Forces/Specific_Interactions/Lennard-Jones_Potential) [3] Democritus: Lennard-Jones Potential https://www.ucl.ac.uk/~ucfbasc/Theory/lenjon.html [4] Molecular Simulation/The Lennard-Jones Potential - Wikibooks [Molecular Simulation/The Lennard-Jones Potential - Wikibooks, open books for an open world](https://en.wikibooks.org/wiki/Molecular_Simulation/The_Lennard-Jones_Potential) [5] The Lennard-Jones potential: when (not) to use it - RSC Publishing https://pubs.rsc.org/en/content/articlelanding/2020/cp/c9cp05445f " [Entropy | Free Full-Text | Entropy, Shannon’s Measure of Information and Boltzmann’s H-Theorem](https://www.mdpi.com/1099-4300/19/2/48)