Book 44 - Burny

AGI is something that chimps don't have and humans do have [The AI Alignment Debate: Can We Develop Truly Beneficial AI? (HQ version) - YouTube](https://www.youtube.com/watch?v=iFUmWho7fBE) The AI Alignment Debate: Can We Develop Truly Beneficial AI? George Hotz and Connor Leahy discuss the crucial challenge of developing beneficial AI that is aligned with human values. Leahy Argument: Alignment is a critical technical problem - without solving it, AI may ignore or harm humans Powerful optimizers will likely seek power and capabilities by default We should limit compute as a precaution until we better understand AI risks AI risks could emerge rapidly if we discover highly scalable algorithms Openly sharing dangerous AI knowledge enables bad actors and risks Coordination is possible to prevent misuse of technologies like AI His goal is a positive-sum world where everyone benefits from AI AI doesn't inherently align with human values and aesthetics Care and love can't be assumed between humans and AI systems Technical solutions exist for aligning AI goals with human values Hotz Argument: Truly aligned AI is impossible - no technical solution will make it care about humans AI will seek power but distributing capabilities prevents domination We should accelerate AI progress and open source developments Power-seeking in AI stems more from optimization than human goals With many AIs competing, none can gain absolute power over others Openness and access prevent government overreach with AI AI alignment creates dangerous incentives for restriction and control Being kind to AIs encourages positive relationships with them His goal is building independent AI to escape earth Competing AIs, like humans, will have different motives and goals Hmm, doing metacognition or training smaller LLM to evaluate if its result is power seeking agasnt humans... Then the metacognition or smaller AI itself also starts powerseeking. How about very specialized algorithm/AI/automated mechinterp localizing power seeking agasnt humans circuits and turning them off as part of architecture, but operationlizing this so far not good enough results. But we got lying circuits localized in small models, that's a start! But not yet circuits for when when GPT4 was decieving humans. But that might work only in the beggining. If the circuit isnt localized deep enough, or not hardwired in the architecture, the bambilions of matrices might evaluate that turning off this safety mechanism is how to succesfully power seek. Hmm, doing metacognition or training smaller LLM to evaluate if its result is power seeking agasnt humans... Then the metacognition or smaller AI itself also starts powerseeking. How about very specialized algorithm/AI/automated mechinterp localizing power seeking agasnt humans circuits and turning them off as part of architecture, but operationlizing this so far not good enough results. But we got lying circuits localized in small models, that's a start! But not yet circuits for when when GPT4 was decieving humans https://www.pcmag.com/news/gpt-4-was-able-to-hire-and-deceive-a-human-worker-into-completing-a-task . But that might work only in the beggining. How about subtracting the deception and power seeking vector localized globally using topdown representation engineering method in simulations. If the circuit isnt localized deep enough, or not hardwired in the architecture, the bambilions of matrices might evaluate that turning off this safety mechanism is how to succesfully power seek. Topdown reprsentation engineering actually already did something like that, could it be enough in practice if scaled, or could the AI outsmart the human anyway if enough intelligence was added, finding a way to remove its safety mechanisms? [Representation Engineering: A Top-Down Approach to AI Transparency](https://www.ai-transparency.org/) discovering new mech interp laws using [Can AI disover new physics? - YouTube](https://www.youtube.com/watch?v=XRL56YCfKtA) methods? mechinterp chaos predicting? https://twitter.com/wgilpin0/status/1737934963365056543 Let's crack and formulate a predictive mathematical model of the dynamics of learning of the ecosystem of structures that neural networks learn and use it to direct existing AI systems for better generalization, reasoning among others for the benefit of all of sentience I think Connor Leahy has solidly upped the AI doom for me now. He's pretty well practically grounded and technically grounded in my opinion. You can tell he's been training the models quite a bit. After the debate, I think there are stronger incentives for AIs to lie, decievate and power seek than I thought until now, and with the way we're heading towards AGI and superintelligence, I don't see it too brightly. He gave a wonderful example with breaking the rules at chess when you add the MuZero machinery to LLMs. It's cool how Leahy convinced even Hotz of this. Even if the best possible technical solution to make AIs not fundamentally malevolent to humans and other AIs is found by people who realy mean it inside the big tech, academia or independent research, I don't see much hopium in making it applicable politically everywhere, without government and corporate totalitarian power grabbing and centralization of intelligence in the hands of people who don't have other people (and overall conscious systems) as an ethical priority, but instead mainly want money and power, as seen with corporations and corrupt governments and governments with tyrannical totalitarian tendencies. While I believe there is some small amount of morality to be found, I don't think it's enough. I don't see as much incentive with them to address alignment deeply, although enough people inside are trying to solve this, from my limited perspective i feel like the real power of those trying genuinely have is really small against corporate incentives who want to accelerate more, commercialize, satisfy the customers and shareholders, etc... Anthropic is probably doing the best from what little I know, OpenAI is on top and Google with its 5x increase in GPUs over the last six months training Gemini 2 will be interesting. But I also think overconstraining the models to current culture and what corporations want is also a risk. RLHF is very weak, but the new alignment methods people are starting to research and use and what they are researching will be sufficient, but I'm worried about them being able to be invented fast enough, scaled up and pushed through management by those deadlines. After all, even corporations probably don't want their AIs to start manipulating, because they would manipulate them and their customers, which would break investments - even if they don't want that, I feel like this is not properly safeguarded in all current influental AI orgs. I think the incentives for safety are there, but not enough! The nuclear war risk is much more obvious, but we almost blew ourselves up a few times already. And biorisk... we'll see soon maybe... When I look at the state of open source, autonomous systems, polarization, China, Russia, Europe, various ethically bad actors,... Even if you get lying/emotional manipulation vectors in some alignment research, those can be inversely used for the opposite purpose by bad actors,... This will be (and already is to some extent) a social mess and maybe beyond. I really don't see how to practically change all these deeply rooted systemic prisonners dillema/principal actor problem incentives, seeing how much power and success people actually trying to fix this have in practice, and seeing how those power dynamics are likely to continue. I have a pretty low hope that the slowing regulatory push all over the place is/will be enough, and whether it will make real safety regulations in practice, or whether the bigger effect will be in corrupted bullshit money/power grabbing type stuff when there's so much money in it. I feel like with governmental GPU oversight it would slow AI research including safety in general just like how doing stem cells research is hard. I dont know if it will realistically reduce the chances of lots of different unaligned superintelligences emerging all over the place. At the same time I wish for democratized intelligence. Competing superintelligent AGIs manipulating each other's manipulations sounds like an interesting future. I'd love to be wrong about this! My hope right now is accelerating mechanistic interpretability and other alignment research (mechinterp increases both alignment but also capabilities, but blind scaling increases just capabilities heh) and accelerating defence and resillience mechanisms in groups with their AIs that care ethically for all genuinely, and put all functioning alignment methods on those AIs, so that at least these superintelligent AIs are much less likely to manipulate you that they can use as a defense. I feel like all other paths will more probably lead to doom where ethical sentientism has less chance of realizing itself. These groups might as a result resist as much of risks and xrisks as possible. Making AGI utopia is a hard technical problem. Let's solve that technical problem. <@747907598342422630> <@431761045237923851> Myslím že Connor Leahy mě teď solidně zvýšil AI doom. Je dle mě dost dobře prakticky grounded a technicky odargumentovaný. Jde vidět že dost trénoval modely. Po tý debatě si myslím, že jsou silnější incentivy pro to aby AIs lhali, decievovali a power seekovali, než jsem si do teď myslel, a s tím jak míříme k AGI a superinteligenci, to nevidím moc brightly. Dal nádherný příklady s rozbíjením pravidel u šach, když k LLMs přidáš MuZero mašinérii. Je hustý jak o tom Leahy přesvědčil i Hotze. I kdyby se našlo co nejlepší technický řešení na to aby AIs nebyly fundamentálně malevolent k lidem a ostatním AIs, tak moc nevidím hopium v tom udělat aby se to použilo i politicky všude, bez vládního a korporátního totalitního power grabbingu a centralizace inteligence v rukou lidí kteří nemají ostatní lidi (a celkově systémy s prožitkem) jako etickou prioritu, ale místo toho chcou hlavně peníze a moc, jako to je vidět u korporací a zkorumpovaných vlád a vlád s tyranickýma totalitníma tendencema. I když věřím, že nějaká špetka morálky se tam najde, tak si nemyslím, že dostatečná. Nevidím u nich tolik incentivy řešit alignment hluboce, i když se dost lidí uvnitř snaží, ale když se člověk koukne jakou mají ti co se snaží reálně moc proti korporátním incentives co chcou víc akcelerovat, komerčnit, satisfy the customers and shareholders apod...! Anthropic je na tom asi nejlíp z toho něčeho co vím, ale OpenAI je on top a Google s jejima 5x increase v GPUs za poslední půlrok trénující Gemini 2 je hádám brzo předežene. RLHF a je moc weak, snad nový metody co se začínají používat a co se researchují budou dostatečný, ale bojím se, aby se stihly vynalézt, naškálovat a prosadit přes vedení v těch deadlinech. Přece jenom ani korpace snad nechtěj aby jejich AIs začaly manipulovat, protože by manipulovali i je a jejich customers, což by rozbíjelo investice, Burny says naively, clueless about how caring and aware are they probably actually about this, not thinking about what a really superintelligent system could do, because humans are surely the only true intelligence, let's ignore all the information theory! Myslím že incentivy pro safety tam jsou, ale ne dostatečný! Nuclear war risk je mnohem víc očividný, ale stejně se dívím, že jsme se ještě neodbouchli, nebylo od toho pár krát daleko. A biorisk... we'll see soon maybe... S tím jak se vyvíjí open source, autonomní systémy, polarizace, Čína, Rusko, Evropy, různí eticky bad actors,... I když získáš lying/emotional manipulation vektory na nějaký alignment, tak ty jdou inverzně použít pro opačný účel bad actorama,... To bude (a už do jistí míry je) společenský bordel and maybe beyond. <:NotLikeThis:649357941920759808> Můj hope je teď v zvýšování defence u antityrrany skupin, nacpat všechny alignment metody na antityrrany superintelligent AIs, aby alespoň u těchto superintelligent byla mnohem menší šance že tě zmanipulujou, který použít jako defence, protože mám pocit že všechny ostatní cesty povedou do vod, kde etický sentientismus má menší šanci naplnění se. Já fakt nevidím jak jinak všechny tyhle deeply rooted systémový multipolar trap incentivy prakticky změnit, když vidím, co lidi co se o to reálně snaží mají za power a úspěch v praxi, a když vidím jak ty power dynamics budou pravděpodobně dál pokračovat. <:NotLikeThis:649357941920759808> Mám dost malej hope že ten zpomalující regulatory tlak všude po světě je/bude dostatečný, a či reálně v praxi udělá reálnej safety nebo spíš větší efekt bude u corrupted bullshitu typu money/power grabbing, když je v tom tolik peněz, aby se reálně snížila šance spousty různých unaligned superinteligencí emerging všude možně po světě. Competing superintelligent AGIs manipulating eachother's psyops sounds like an interesting future. Rád bych se v tomhle doomu mýlil, snad se najde nějaký spare hopium někde! EPFL researchers have developed an algorithm to train an analog neural network just as accurately as a digital one, enabling the development of more efficient alternatives to power-hungry deep learning hardware. https://actu.epfl.ch/news/training-algorithm-breaks-barriers-to-deep-physi-4/ https://www.science.org/doi/full/10.1126/science.adi8474?af=R&mi=0&target=default ML papers of the week [GitHub - dair-ai/ML-Papers-of-the-Week: 🔥Highlighting the top ML papers every week.](https://github.com/dair-ai/ML-Papers-of-the-Week) Let's metaoptimize existence by hyperheuristics by automating machine learning [Meta-optimization - Wikipedia](https://en.wikipedia.org/wiki/Meta-optimization) [Hyper-heuristic - Wikipedia](https://en.wikipedia.org/wiki/Hyper-heuristic) [Automated machine learning - Wikipedia](https://en.wikipedia.org/wiki/Automated_machine_learning) [Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)) An architecture that does optimal arbitrarily deep metareasoning (search using LLMs with today's best general reasoning technologies?) to reconstruct itself and its meta reasoning constructs until best premade pretrained or from stratch made and trained architecture (that can consist of a single linear regression, or an ecosystem of LLM CoT agents with search), hyperparameters, training methods, degree of hardwired architecture and priors, degree of (statistical) fluidity etc. is reached for its particular task it was given How to increase civilizational resilience without tyrrany? velký % lidí co si nevytvoří imunitu udoomscrolluje ke smrti, protože recommender systémy a tvorba obsahu na míru [[2312.10868] From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape](https://arxiv.org/abs/2312.10868) From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape Become free of the shackles, increase your mental and environmental agency Our civilization right now is already a paperclip maximizer my favorite model of enlightenment is extended kegan stages by joscha bach [Levels of Lucidity - Joscha Bach](https://joscha.substack.com/p/levels-of-lucidity?utm_source=profile&utm_medium=reader2) Reactive survival (infant mind) Personal self (young child) Social self (adolescence, domesticated adult) Rational agency (epistemological autonomy, self-directed adult) Self authoring (full adult, wisdom) Enlightened mind Transcendent mind as i feel like they're the most useful and connected to cognitive science, plus QRI's neurophenomenological degree of symmetrification [The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing](https://qualiacomputing.com/2021/11/23/the-supreme-state-unconsciousness-classical-enlightenment-from-the-point-of-view-of-valence-structuralism/) "Where stage 5 allows the deconstruction of one’s own identity, stage 6 goes a level deeper and deals with the implementation of perception, the construction of qualia (the features of perceptual experience at the interface of the self), the architecture of motivation and the regulation of physiology. This is the domain of advanced meditators. Stage 6 can bring us full circle, by deconstructing the boundary between the first person perspective and the generative mind. We become aware that all experience (perception and motivation) is representational, and that we are fully in control of these representations. Without rational epistemology, we might perceive that we one with and in control of the universe itself, which is experientially correct (the universe that surrounds our personal self is a simulation produced by our own mind)." i myself think of it as instead that most people are nonlinearly oscillating between the stages depending on the context they're in, and some people spend more time in some stages than others, depending on what is their dynamics of their default neurophenomenological metastable equilibriums and environmental circumstances and the actions they do that affect it my model is that direct realization can include objective realization (which i dont like for pragmatic information theoretic reasons) and subjective realizations (but here i like to see that you're only aware of your mental models that attempt to compress physical reality for maximizing evolutionary fitness, not for maximizing truth aka all the illusions and biases we get as a result, and all sorts of ontologies, structures, categories, divisions etc. can emerge in neurophenomenology as a result) another subset is nonsymbolic realization in my model the kind of realization you get when you dissolve into void on 5-MeO-DMT with whole day deconstructive meditation feeling like you've realized the ultimate statespace of all possible truths in all possible logical systems, nonsymbolic experiences, mental frameworks, multiverses cool qualia if you merge physicalism with idealism, technically all experience is direct both subjectively and objectively in the sense of you're your objective physical brain dynamics so youre technically directly realizing them as a subject i think the most pragmatic philosophical framework is that we're subset of our brain dynamics that encode compressed representations of itself and the environment that can be on nonsymbolic-symbolic spectrum (with whatever cultural structures you learn to interpret the sensory data with depending on what youve learned) with all the laws of physics governing with all the layers of abstraction of the universe, here subjects are fully naturalized, and part of the object i define qualia as part of experience in my model there are qualia that can be represented (mental processes for example), and they're qualia because they're also part of experience and raw experience, raw sensory data, nonsymbolic qualia - i think you are refering to those? and this lies on a spectrum, where all models of any qualia are predictive approximations thorough review of the empirical status of predictive coding and active inference models at the cognitive and neural level: https://www.sciencedirect.com/science/article/abs/pii/S0149763423004426 Can machine learning predict chaos? My new paper performs a large-scale comparison of modern forecasting methods on a giant dataset of 135 chaotic systems. https://twitter.com/HackBoing/status/1737967582735737226 mechinterp is digital neuroscience Are you ready for cyber von neumann galaxy eaters LLM paper summary https://twitter.com/g_leech_/status/1740027508727464312 [André Joyal: "Higher topos theory and Goodwillie Calculus" - YouTube](https://www.youtube.com/watch?v=AAUWJ7NHJGk) André Joyal: "Higher topos theory and Goodwillie Calculus" Let's build something 10+10*10^10↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑10 smarter than humans, it will be fun they said Let's safeguard and build the best possible future The best possible future will be built openmindedness math cohomology sheaves https://twitter.com/burny_tech/status/1740256449899753916