I really want to contribute to the mechanistic interpretability field now. I have looked at various AI safety and mechanistic interpretability papers and lectures, but didn't get my hands properly dirty in practice yet. I started writing GPT from scratch ([Let's build GPT: from scratch, in code, spelled out. - YouTube]([Let's build GPT: from scratch, in code, spelled out. - YouTube](https://www.youtube.com/watch?v=kCc8FmEb1nY))) and now I am going through the Arena material and looking forward to replicate the various mechinterp papers. My main goal is to help AI alignment. I am currently thinking about gathering the needed skills and knowledge for: - Compiling a collection (and potentially a meta-analysis) of mechinterp papers and benchmarks looking at deception, manipulation, lying, power seeking, cheating, wanting to steal and kill and other related behaviors, like in The MACHIAVELLI Benchmark ([MACHIAVELLI]([MACHIAVELLI](https://aypan17.github.io/machiavelli/))) - Maybe extending the "Geometry of truth" paper ([[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets]([[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets]([[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets](https://arxiv.org/abs/2310.06824)))), trying to localize the phenomena I just mentioned on a circuit level. - Maybe trying to reverse engineer the neural populations correlating with the phenomena I just mentioned found in representation engineering paper ([[2310.01405] Representation Engineering: A Top-Down Approach to AI Transparency]([[2310.01405] Representation Engineering: A Top-Down Approach to AI Transparency](https://arxiv.org/abs/2310.01405))). - Making mechinterp tools realtime - Maybe assisting in automating of these methods, contributing to automated circuit discovery ([[2310.10348] Attribution Patching Outperforms Automated Circuit Discovery](https://arxiv.org/abs/2310.10348)) , or LLMs explaining LLMs ([Language models can explain neurons in language models]([Language models can explain neurons in language models]([Language models can explain neurons in language models]([Language models can explain neurons in language models](https://openai.com/research/language-models-can-explain-neurons-in-language-models))))), or automated alignment research and engineering using an ecosystem of interacting LLM engineers. ([Autonomous chemical research with large language models | Nature](https://www.nature.com/articles/s41586-023-06792-0)) (https://microsoft.github.io/autogen/) I am also interested in working on or collaborating on other mechinterp projects as well if there is an opportunity, please feel free to message me for collaboration. :-) Currently I'm pursuing these interests in my free time while figuring out how to manage my finances. I would be up for engaging in this work part time or full time, if such an opportunity arises in this field, I am eager to pursue it! I really care about ensuring AI going well and having positive impact on humanity's future and I want to contribute to it with as much time as possible! If you have any tips and pointers on how to continue this path and how to prevent common errors, I would appreciate them! :-) I'm open for overall networking in this field, both virtually and in person! I'm currently based in Czechia. Thank you for taking the time to read my message. Please feel free to reach out to me for any discussions, collaborations, or insights! I think what I wanna try to do now is to do a create a collection and metaanalysis on work on alignment focusing on influencing/localizing deception, manipulation, lying, power seeking, cheats, steals, and kills, and contribute my own, trying to for example localize power seeking would be cool, trying to reverengineer the relevant neural populations found in ai-transparency paper, and I wish for automating all these methods Make LLM do causal scrubbing, localize power seeking found in LLM transoarency [My techno-optimism]([My techno-optimism]([My techno-optimism]([My techno-optimism]([My techno-optimism](https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html))))) [Tai-Danae Bradley | Category Theory and Language Models | The Cartesian Cafe with Timothy Nguyen - YouTube](https://www.youtube.com/watch?v=Gz8W1r90olc&list=PL0uWtVBhzF5AzYKq5rI7gom5WU1iwPIZO&index=12&pp=iAQB) Tai-Danae Bradley | Category Theory and Language Models | The Cartesian Cafe with Timothy Nguyen [LARP: LANGUAGE-AGENT ROLE PLAY FOR OPEN-WORLD GAMES]([LARP: LANGUAGE-AGENT ROLE PLAY FOR OPEN-WORLD GAMES](https://miao-ai-lab.github.io/LARP/)) LARP: LANGUAGE-AGENT ROLE PLAY FOR OPEN-WORLD GAMES [Synthetic Sentience: Can Artificial Intelligence become conscious? | Joscha Bach | CCC #37c3 - YouTube](https://www.youtube.com/watch?v=Ms96Py8p8Jg) AI in 2024 news [4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube](https://www.youtube.com/watch?v=Xq-QEd1jpKk))) AGi predictions [Reddit - Dive into anything](https://www.reddit.com/r/singularity/)comments/18vawje/singularity_predictions_2024/kfpntso/ [Andrej Karpathy, OpenAI (AGI House) - YouTube](https://www.youtube.com/watch?v=tGe6syxT4C4) On AI agents, connections between AI and the brain Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure, LLM strategically deceiving their users in a realistic situation without direct instructions or training for deception [[2311.07590] Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure]([[2311.07590] Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure](https://arxiv.org/abs/2311.07590)) https://twitter.com/AISafetyMemes/status/1741941571757449491 A ano, do jistý míry tady kopiruje lidi (ale není to niky čistá kopie, neuronky si tvoří vlastní vnitřní obvody, kterým bychom museli rozumnět, kdyby se tenhle obor AI mechanistic interpretbility víc učil a fundoval), protože ve financích jsou liars, a mi nemáme jak ji to fundamentálně odnaučit, a mám pocit že tuto možnost dostatečně rychle mít nebudeme, tím jak víc škálujeme než zjišťujeme co se děje uvnitř aby to šlo kontrolovat a ovládat [GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.](https://github.com/JShollaj/awesome-llm-interpretability)))) Is government the only functioning coordination technology to mitigate AI xrisk? [Connor Leahy on The Unspoken Risks of Centralizing AI Power - YouTube](https://youtu.be/BhQBmVZ5XP4?si=W5Rzni-vsoRe4vCm) [STM - State-space models - filtering, smoothing and forecasting]([STM - State-space models - filtering, smoothing and forecasting](https://statisticssu.github.io/STM/tutorial/statespace/statespace.html)) život je hustej nonqeuilibrium thermodynamickej pullback attractor otevřenej systém co žere volnou energii [No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube]([No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube]([No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube]([No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube]([No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube]([No Turning Back: The Nonequilibrium Statistical Thermodynamics of becoming (and remaining) Life-Like - YouTube](https://www.youtube.com/watch?v=10cVVHKCRWw)))))) Nemůže existovat život bez využití hmotný enegie? S tím co teď existuje a co máme naměřený asi jo. Hmm, záleží jak se definuje život, a co jsou teoretický limity různých energetických substrátů. 😄 technicky pokud platí electromagnetická teorie prožitku [Electromagnetic theories of consciousness - Wikipedia]([Electromagnetic theories of consciousness - Wikipedia]([Electromagnetic theories of consciousness - Wikipedia](https://en.wikipedia.org/wiki/Electromagnetic_theories_of_consciousness))) , tak teoreticky stačí electromagnetická enegie a hmotná není třeba (i když v praxi se v mozku electromagnetická energie generuje z electrický energie co se generuje z chemický energie z nutrientů v buňkách (ATP syntéza)) [Quantum foam - Wikipedia]([Quantum - Wikipedia](https://en.wikipedia.org/wiki/Quantum)_foam) [String theory - Wikipedia]([String theory - Wikipedia](https://en.wikipedia.org/wiki/String_theory)) [Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia]([Loop quantum gravity - Wikipedia](https://en.wikipedia.org/wiki/Loop_quantum_gravity)))))))))))))) Quantum machine learning works with superpositions of paramezers governed by the Shrondinger equation manipulated using phase kicks (leading to massive parakellism?) [Guillaume Verdon: Beff Jezos, E/acc Movement, Physics, Computation & AGI | Lex Fridman Podcast #407 - YouTube](https://youtu.be/8fEEbKJoNbU?si=RaIbRYQmDkKIXZxb) 1:25:00 The architecture stuff is fun, making hardware efficient is fun. But I think ultimately it's about data. If you look at the scaling law curve, different architectures would generally have the same slope, but they're just different offset. Seems like the only thing that changes the slope is the data quality. [4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube](https://www.youtube.com/watch?v=Xq-QEd1jpKk))) [🤗 Transformers]([🤗 Transformers]([Hugging Face – The AI community building the future.](https://huggingface.co/)docs/transformers/index)) [Deriving the Transformer Neural Network from Scratch #SoME3 - YouTube]([Deriving the Transformer Neural Network from Scratch #SoME3 - YouTube](https://www.youtube.com/watch?v=kWLed8o5M2Y)) Deriving the Transformer Neural Network from Scratch #SoME3 future models with be predicting frames of what could happen in future as chain of thought in multimodal way which will resemble reasoning more [4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube]([4 Reasons AI in 2024 is On An Exponential: Data, Mamba, and More - YouTube](https://www.youtube.com/watch?v=Xq-QEd1jpKk))) [[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets]([[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets]([[2310.06824] The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets](https://arxiv.org/abs/2310.06824))) The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets [[2212.03827] Discovering Latent Knowledge in Language Models Without Supervision]([[2212.03827] Discovering Latent Knowledge in Language Models Without Supervision](https://arxiv.org/abs/2212.03827)) Discovering Latent Knowledge in Language Models Without Supervision I'm ready for fully AI automated QAnon-like cults with fully automated prophets creating fully sensory illusionary interactive systems creating fully complex narratives that are completely disconnected from reality creating epistemic collapse and semantic apocalypse [Connor Leahy on The Unspoken Risks of Centralizing AI Power - YouTube](https://www.youtube.com/watch?v=BhQBmVZ5XP4) [Joscha Bach and Connor Leahy [HQ VERSION] - YouTube]([Joscha Bach and Connor Leahy [HQ VERSION] - YouTube](https://www.youtube.com/watch?v=Z02Obj8j6FQ)) newest AI mindreading [UTS HAI Research - BrainGPT - YouTube]([- YouTube](https://www.youtube.com/watch?v=crJst7Yfzj)4) [New 'Mind-Reading' AI Translates Thoughts Directly From Brainwaves – Without Implants : ScienceAlert](https://www.sciencealert.com/new-mind-reading-ai-translates-thoughts-directly-from-brainwaves-without-implants) i swear this is the hardest possible way to learn physics [geometry of physics in nLab]([geometry of physics in nLab]([geometry of physics in nLab]([geometry of physics in nLab]([geometry of physics in nLab](https://ncatlab.org/nlab/show/geometry+of+physics))))) Scott Aarson agnosticism about AI risk, cryptographized alignment in open source AI and ubremovable backdoors shut off button on AGI [#8: Scott Aaronson - Quantum computing, AI watermarking, Superalignment, complexity, and rationalism - YouTube](https://youtu.be/wfxf6MembCQ?si=7b_kN4f1JlLN8lMZ) Statistical mechanics best summary [Teach Yourself Statistical Mechanics In One Video - YouTube](https://youtu.be/zFAxiRAiM24?si=PZfdVskyE4WXlr7O) [The Psychological Drivers of the Metacrisis: John Vervaeke Iain McGilchrist Daniel Schmachtenberger - YouTube](https://youtu.be/uA5GV-XmwtM?si=KWEhHfrGuBZXqjhY) Math books [Serious Charts /sqt/ - Album on Imgur](https://imgur.com/a/ZZDVNk1) [Meme Charts /sqt/ - Album on Imgur](https://imgur.com/a/pHfMGwE) AGI is something that chimps don't have and humans do have [The AI Alignment Debate: Can We Develop Truly Beneficial AI? (HQ version) - YouTube]([The AI Alignment Debate: Can We Develop Truly Beneficial AI? (HQ version) - YouTube](https://www.youtube.com/watch?v=iFUmWho7fBE)) The AI Alignment Debate: Can We Develop Truly Beneficial AI? Topdown reprsentation engineering actually already did something like that, could it be enough in practice if scaled, or could the AI outsmart the human anyway if enough intelligence was added, finding a way to remove its safety mechanisms? [Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency]([Representation Engineering: A Top-Down Approach to AI Transparency](https://www.ai-transparency.org/)))))))) discovering new mech interp laws using [Can AI disover new physics? - YouTube]([Can AI disover new physics? - YouTube]([Can AI disover new physics? - YouTube]([Can AI disover new physics? - YouTube](https://www.youtube.com/watch?v=XRL56YCfKtA)))) methods? Let's metaoptimize existence by hyperheuristics by automating machine learning [Meta-optimization - Wikipedia](https://en.wikipedia.org/wiki/Meta-optimization) [Hyper-heuristic - Wikipedia](https://en.wikipedia.org/wiki/Hyper-heuristic) [Automated machine learning - Wikipedia](https://en.wikipedia.org/wiki/Automated_machine_learning) [Meta-learning (computer science) - Wikipedia]([Meta-learning (computer science) - Wikipedia]([Meta-learning (computer science) - Wikipedia]([Meta-learning (computer science) - Wikipedia]([Meta-learning (computer science) - Wikipedia](https://en.wikipedia.org/wiki/Meta-learning_(computer_science)))))) my favorite model of enlightenment is extended kegan stages by joscha bach [Levels of Lucidity - Joscha Bach]([Levels of Lucidity - Joscha Bach]([Levels of Lucidity - Joscha Bach](https://joscha.substack.com/p/levels-of-lucidity?utm_source=profile&utm_medium=reader2))) as i feel like they're the most useful and connected to cognitive science, plus QRI's neurophenomenological degree of symmetrification [The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing]([The Supreme State of Unconsciousness: Classical Enlightenment from the Point of View of Valence Structuralism | Qualia Computing](https://qualiacomputing.com/2021/11/23/the-supreme-state-unconsciousness-classical-enlightenment-from-the-point-of-view-of-valence-structuralism/))))))))))))))))) https://twitter.com/burny_tech/status/1740256449899753916 [GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.]([GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources.](https://github.com/JShollaj/awesome-llm-interpretability))))