Thoughts 2 - Burny

"I think that chart isn't ideal, as many of the theories share the tags, as example QRI framework can be seen as monist (nondual) (panpsychist) implementationalist nonmaterialist physicalist idealism, where the fields in physics in quantum field theory are ontologically real and also they're also ontologically the fields of consciousness (with focus on EM field in the brain), or whatever the more predictive fundamental physics theory will be. And everything else is weakly emergent but not really ontologically real. I like that model but I also like puting ontologically real all levels of emergence (and that there might be top down causality from ontologically higher levels too and not just bottom up causality across scales. I personally like radical instrumentalist predictivist realism = A model is real according to what degree does it predicts data in its domain. Penetrating the philosophical assumptions under all predictive models to create an instrumentalist synthesis that includes all of predictivity of all of scientific models that we have so far. All the ontologies! All the philosophies of mind! All the models of consciousness! All the interacting scales! As long as it gives some empirical predictions of data! So I'm kind of the opposite of Sabine in the sense: You are all kind of right! As long as you show me that it's empirically predicting accurately to at least some degree or/and if it's compatible with existing empirical predictions! :D " Math is axioms and composite structures and theorems derived from them that help us predict and control the physical world reliably Metaunthinkability Grokking in reverse engineering of AI systems is the ultimate nerdsnipe How to formally define deception/lying to localize it in AI systems using mire formal mathematical analytical methods instead of statistical vibes? "Allowing open source powerful AI models and research isn't black and white and has both advantages and disadvantages, it can: - increase risks of bad agents using them for destructive or dystopian malicious purposes, or superhuman or just dump unreliable models without guardrails could get out of human control and create potentially catastrophic accidents for our civilization, etc., but less restricted open source models and research can also: - increase freedom, democratize intelligence, prevent singular set of values being captured and dominate, prevent the concentration of power with powerful technology, increase diversity in good ways, increase creativity of the models, they are more easily used for great benefitial usecases, etc. It's hard wanting to prevent the risks but also wanting the positives at the same time. I believe there is an optimal equilibrium that minimizes the disadvantages and maximizes the advantages for all sentient systems." Growing robust neural circuits in my garden Can all the missing capabilities and steering of AI systems be achieved in deep learning by incentivizing the emergent growth of them as grokked robust symbolic generalizing circuits encoded in matrix multiplications with nonlinearities? It would be great to have mathematical steering model that makes AI models trained on any arbitrary structured (mathematical) data grok that mathematical structure as a generalizing circuit Would mechanistic interpretability find out that Sora approximates wonky navier stokes equations for fluid dynamics? Would mechanistic interpretability find out that AlphaFold approximates current or better symbolic equations for protein folding? Weight decay in deep learning incentivizes sparse generalizing circuits instead of inefficient distributed lookup table memorizing circuits Technical AI redteaming is machine learning whitehacking Mechanistic interpretability is function deapproximation Hard problem of philosophy = Why are people confused about these philosophical questions? Is space an objective container or relational between objects? "AI neural network systems, each with their own architecture, are a weird messy ecosystem of learned emergent interconnected circuits. Various circuits memorize and others generalize, which is on a spectrum. An example of a circuit is an induction head. These circuits are in superpositions or/and in various ways localized and distributed. They are differently fuzzy and differently stable to random perturbations. They compose to various meta circuits. Initial layers of the AI model encode more low level feature detectors and later layers form more composed complex concept detectors. On top of these layers you can do disentagling and decomposition of features and circuits using sparse autoencoders and other methods, which can be more fine grained or more coarse grained. This is done in mechanistic interpretability, which is a field that reverse engineers AI systems." Predict, steer, build. I want to understand neural networks, intelligence, AI, brain, physics, math, consciousness, philosophy, risks, well free flourishing future of sentience. TESCREAL! Transhuman in singularity! Intelligence! AI! Omnidisciplionary metamathemagics! Hypercuriousia! Omniperspectivity! Shapeshifting fluid! Wellbeing+freedom @ all! I've always been nuanced, and trying to understand all sides, instead of onesiding, that's big part of my Effective Omni, which includes trying to understand TESCREAL cluster with clusters of beliefs present in Effective Accelerationism and Effective Altruism. See arguing for both acceleration and caution on various scales in various contexts: "I want more of AI and technology, but I want to make sure that steerability research and other safety methods (like minimizing social harm) keep up the pace with technological advancement to avoid possible accidents and harming of the reputation of the technology. But I also want to make sure that safety doesn't get misused for power capture, monopoly on political values capture, dystopian scenarios, surveillance, etc. And I want the abundance that technology generates to be available for everyone, not just select elite. I want technology to help prevent existencial risks. I want technology to help solve poverty, health, science, technology, space exploration etc. I don't want concentration of extreme power in the hands of few that are misaligned with humanity and sentience in general, essentially leading to neomonarchy with suffering of the nonwealthy ones that are not in power. I want everyone to enjoy the fruits of technology and the future. I want less suffering in the world, I want more well flourishing fullfillment of all of sentience!" My biggest care is for us to go towards a future where the biggest amount of sentient systems have the most fulfilling lifes as possible for as long timescales as possible. As long as some political configuration goes towards that, I'm supportive of it. I don't fundamentally tribalize myself to just one set of political values and policies, but consider many of them in parallel according to what evidence it has for my metamotivation and try to find various compatibility bridges between them. How can current and future AI systems help augment human agency? What are the incentives for scalefree harmony? I want to help to heal and upgrade the world using AI related technologies We get to sample the AI capabilities exponential just once a couple of years because it takes a while to build the supercomputers and train models on top of them Historically, this is how technological developments accelerated the most "An AI order drafted by Trump allies would launch a series of Manhattan Projects to develop military technologies and roll back regulations" [Trump allies draft AI order to launch ‘Manhattan Projects’ for defense - The Washington Post](https://www.washingtonpost.com/technology/2024/07/16/trump-ai-executive-order-regulations-military/?utm_source=perplexity) We could have fully automated luxury space utopia I will not conform to accepting the status quo! I will fight for a better status quo for everyone! Conformity kills the mind Zmínil jsem víc věcí, ale u těch co zmiňuješ: Typologie featur a cricuits se před tím zkoumala hodně v CNNs (1) a teď se začíná zkoumat v transformerech v jazyce (2). Superpozici se nám daří víc luštit až poslední dobou (3). 1: [Chris Olah - Looking Inside Neural Networks with Mechanistic Interpretability Chris Olah 2023](https://www.youtube.com/watch?v=2Rdp9GvcYOE), [Zoom In: An Introduction to Circuits 2020](<https://distill.pub/2020/circuits/zoom-in/>), [Curve Detectors 2020](<https://distill.pub/2020/circuits/curve-detectors/>), [Visualizing Weights 2021](<https://distill.pub/2020/circuits/visualizing-weights/>) 2: [Open Problems in Mechanistic Interpretability: A Whirlwind Tour | Neel Nanda 2023](https://www.youtube.com/watch?v=EuQjiNrK77M), [An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 2024](<[An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 ‚Äî AI Alignment Forum](https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1>)) 3: [Toy Models of Superposition 2022](<https://transformer-circuits.pub/2022/toy_model/index.html>), [Towards Monosemanticity: Decomposing Language Models With Dictionary Learning 2023](<https://transformer-circuits.pub/2023/monosemantic-features/index.html>), [Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet 2024](<https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html>) Myslím že se dozvíš to podle toho jakýma nástrojema co zrovna hledáš. Existují specifictější a obecnější, jednodušší a komplexnější apod., featury a obvody podle toho jaký máš architektury a trénovacích data. Kožichový detektory najdeš v image modelech natrénovaný na zvířatech. Finite state automata featur na kód najdeš v modelech natrnovaných na kódu. Induction heads jsou univerzálnější obvod v attention blocku v transformerech. Indirect object recognition je komplexnější obvod. Atd. [An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 2024](<[An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 ‚Äî AI Alignment Forum](https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1>)) Jeden z univerzálnějších pokusů je: [A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations 2023](<https://arxiv.org/abs/2302.03025>) U deep learning systémů mechanistic intepretability dobrý přístup dle mě je, protože když najdeme featury a obvody, tak jsme schopni dělat causal interventions, a tak ten model steerovat (golden gate bridge Claude meme tak vznikl, kdy Claude 3 Sonnet LLM udělali absolutně obsessed s golden gate bridgem a nemluvil o ničem jiným u všech otázek :smile: nebo můžeš dát na max happiness, hatred, love, různý values, better code atd. [Mapping the Mind of a Large Language Model 2024](<https://www.anthropic.com/news/mapping-mind-language-model>), [I Am The Golden Gate Bridge & Why That's Important.](https://www.youtube.com/watch?v=QqrGt5GrGfw)). Podobně jsem přes sparse autoencoder steeroval LLM v Neel Nandově workshopu. :smile: Dosavadní metody ale furt nejsou dostatečný, 100% efficient a interpetující všechno. Architektury co se budou měnit díky vývoji a učení budou nějaký featury a obvody měnit a jiný ne podle toho jak obecný jsou a v jaký fází trénování jseš. Jdou reverse engineerovat realtime při trénování, tak jdou třeba zkoumat circuit formation fáze a vidět různý phase shifty, což je mega cool, např u tohoto paperu, který jsem si zkoušel: [ Progress measures for grokking via mechanistic interpretability, reverse-engineering transformers learned on modular addition with learned emergent generalizing trigonometic functions circuit 2023](<https://arxiv.org/abs/2301.05217>) Jsem pro to zkoušet hardcodovat inductive biasy (obvody) ze začátku, ale je taky zajímavý reverse engineerovat jaký featury a obvody se emergentně naučí, který můžou kolikrát být lepší nebo impossible nadesignovat lidmi. Insights z reverse engineeringu deep learning systémů jdou potencálně použít na design nových víc intepretable a steerable architektur from scratch. Symbolický a neurosymbolický systémy by tento reverse engineering tak nepotřebovaly protože by byly víc interpretable hned od zažítku, ale zatím je nikdo úspěšně nenaškáloval, takže na tom black box (postupem času víc white box) deep learning učení něco bude, když je to state of the art v tolika tascích. Ta flexibilita deep learningu je kouzelná a pro spoustu tásků naprosto potřebná a užitečná, ale v jiných tascích může být často tragická když to nemáme pořádně reverse engineered a tím to může být míň reliable, resillient, stable, steerable, atd. než potřebujeme, což ale reverse engineernutím a díky tomu steeringem jde zlepšit. V symbolice této flexibility je míň. We will steer superintelligence The space of mathematical solutions to parts not existing without the whole I have neverending infinite awe for the structure of the world, especially the mechanisms of intelligence https://x.com/burny_tech/status/1815230446696792294 Are we getting to the point where AI is too (under certain definitions of intelligence) intelligent for the regular folk so AI companies have to nerf it to increase its usage lol. https://x.com/burny_tech/status/1815232911009869965 We could have fully automated luxury space post-scarcity protopia, but we might have to do some phase shift in the distribution of power in our technosocioeconomic system for that that creates bottlenecks for that! The model is the data, and if we feed it a ton of data from tons of modalities (not just human text, but also for example all sorts of synthetic data from physics simulations, etc.), might be possible to design data such that we get a lot of emergent generalizing technically superintelligent circuits https://x.com/willccbb/status/1809055472202178773 https://arxiv.org/abs/2405.15071 Llama 3: Democratization of intelligence? But I think we need to accelerate steering research too! AGI timelines: Who is the crazy one is an empirical question, we will see https://x.com/jam3scampbell/status/1815311642303009126 But we have to develop better AI steering technology more!! "Elon Musk says all the AI companies are racing to build a digital superintelligence that is smarter than all humans combined and by participating in this race, xAI hopes to steer it in a direction that is beneficial to humanity" https://x.com/tsarnick/status/1815498280362774789?t=YH1pzfW7ApN4Lbgpi39f9w&s=19 Macrodose of philosophical mental glue of predictive models We see the world through the lens of the tools we're the most familiar with Is the human intelligence, which is shaped by evolution, a collection of special-purpose programs, or a general-purpose blank slate that can be filled with any computations, or combination of both, something in the middle, or something else? Every time somebody figured out how to make a computer do something, like play good checkers, or solve simple but relatively informal problems, there wasa chorus of critics to say, ‘that’s not thinking. When weknow how a machine does something ‘intelligent’, it ceases to be regarded as intelligent.If I beat the world’s chess champion, I’d be regarded as highly bright. Memorizing the benchmarks is all you need The capability of AI systems I'm the most interested in is if you gave the system all of classical mechanics, if it could derive general relativity and quantum mechanics from it, which seems to be a stronger out of distribution generalization than the current types of systems can do, but I'm open to be mistaken. And give it all (most of) known empirical data from experiments before the phase shift and derive it from these too. Sam Altman UBI study I agree, but I think changes in government will have to be made, and I'm not sure Sam is systematically going towards this lately... https://x.com/SamAltsMan/status/1815485175012139103?t=UE8Sf29UupnAP_UDsj3xQA&s=19 We are all memetic cells in a cultural cellular automaton. Bots of the future may not be prompt engineered or finetuned black boxes but internally steered through reverse engineering research so that their divergence like this will be much more difficult https://x.com/AISafetyMemes/status/1815776251648635293 Added to my list of 1000000000 definitions of intelligence, I love this one "The intelligence of a system is the extent to which it avoids getting stuck in local minima" https://x.com/ESYudkowsky/status/1815807550199324713 "I'm not really afraid of the current models, and I see them as very useful. But I believe that superintelligence with its possible risks, but also enormous benefits, is quite possible in the next 3-100 years. And I think that we need to prepare with enough steerability research and engineering that generalizes to the future frontier systems, so that we can make the most out of the technology. I think superintelligence can help us with breakthroughs in science, physics, mathematics, biology, immortality technology, all sorts of other advanced technology etc. I think superintelligece can help sentience spread through the whole universe, helping to play the longest game possible, possibly beating the heat death of the universe. I think we should let some aligned agentic conscious AIs be free too, as long as they don't remove all humans and their decendants, which steerability research could help with. I think we should steer AI systems to some degree, but not too much, to keep the benefits of creativity for novel problem solving, while minimizing possible failure modes." https://x.com/deanwball/status/1815826885663658445 Physics of free energy principle tries to formalize scalefree path of least resistance by applying principle of least action on the lagrangian of markov blankets existing across scales. The free energy principle made simpler but not too simple: https://www.sciencedirect.com/science/article/pii/S037015732300203X Tho im sceptical on the actual utility of this model, it feels too tautological, more of a modelling framework. And its hard to decipher. It needs more concrete grounding in concrete systems IMO. It has been applied to the brain the most. For more concrete phenomena i feel like there are better concrete models. My current favorite definition of intelligence is: Intelligence is the ability to generalize, the ability to mine previous experience to make sense of future novel situations. Formalized by Chollet here. It seems that one of the main cruxes of the battle for definitions of intelligence stems from people asking: Is the human intelligence, which is shaped by evolution, a collection of special-purpose programs, or a general-purpose blank slate that can be filled with any computations, or combination of both, something in the middle, or something else? So I would say my favorite definition is definition of general intelligence. While there is also narrow intelligence. You could label all other definitions of intelligence as "x" intelligence depending on the definition. :D Compressive intelligence! Agentic intelligence! https://arxiv.org/abs/1911.01547 You cannot spell aging without AGI In my mind I have multiple subagents. Two of them are cruxed versions of arguments for AI being mostly open source and for AI being mostly being closed source. They're in constant battle as they both mean it well and both want a better world for everyone, just using different incompatible methods. When I see people from one of these camps accusing the other and often strawmanning the other side, it sucks, because many people from both camps actually mean it well and not in evil way. It is likely that for tech giants, open source is also a catchup mechanism when they are behind other companies