Different flavors of AI control!
There isn't just AI alignment, there is:
- AI wokeism - install political correctness into AI model
- AI alignment - installing of political or ethical values through for into the AI model
- AI political bias installation - subset of AI alignment, install concrete political biases on political compass into AI model
- AI ethics - subset of AI alignment, make the model be have nice personality, helping, no insulting, not lying, not deceiving, friendly, accepting etc.
- AI notkillhumanism - special case of value installation: not killing humans
- AI notkilleveryoneism - special case of value installation: not killing all humans
- AI notkillsentientsystemism/notkillbeingism - special case of value installation: not killing sentient systems / beings
- AI notkillallsentientsystemism/notkilleverybeingism - special case of value installation: not killing all sentient systems / beings
- AI scientific truth alignment - make the model aligned with scientific empirical truth
- AI control - making sure system does what the designer wanted, it doesn't matter what it is in general - politics, ethics, killing, following instructions, summarizing text, not going completely random
And all of the above has two variants:
- AI surface control - using weak control methods such as prompt engineering or RLHF or constitutional AI etc. that are for example easily broken by jailbreaks
- AI fundamental control - using stronger control methods such as hardcoding implicit biases, removing or reinforcing various features and circuits directly instead of indirectly, formal verification, mathematical proofs that the system will do only x or won't do x using any kind of mathematics, game theory, physics, neurosymbolics etc.
Inner and outer alignment are concepts used to describe different aspects of aligning AI systems with human intentions.
**Outer Alignment**:
- Refers to the challenge of specifying goals or reward functions for an AI system correctly. It involves ensuring that the AI's objectives, as defined by humans, accurately reflect human intentions. Failures in outer alignment occur when the AI optimizes for a proxy goal that does not fully capture the intended objective[1][5].
**Inner Alignment**:
- Concerns the AI's internal optimization processes. It focuses on ensuring that the AI's learned behaviors and goals align with the specified objectives. Inner alignment failures happen when the AI develops unintended goals or behaviors that still perform well according to the reward function but do not align with the true objective[2][5].
Criticisms of this breakdown include the difficulty in classifying some failures strictly as outer or inner misalignment and the potential irrelevance of the reward function to the actual learned policy[1][5].
Citations:
[1] Categorizing failures as “outer” or “inner” misalignment is often ... [Categorizing failures as “outer” or “inner” misalignment is often confused — AI Alignment Forum](https://www.alignmentforum.org/posts/JKwrDwsaRiSxTv9ur/categorizing-failures-as-outer-or-inner-misalignment-is)
[2] What is inner alignment? - by Jan Leike [What is inner alignment? - by Jan Leike](https://aligned.substack.com/p/inner-alignment)
[3] On the Confusion between Inner and Outer Misalignment - LessWrong https://www.lesswrong.com/posts/hueNHXKc4xdn6cfB4/on-the-confusion-between-inner-and-outer-misalignment
[4] Inner and Outer Alignment Decompose One Hard Problem Into Two ... [Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems – Center for Human-Compatible Artificial Intelligence](https://humancompatible.ai/news/2023/01/03/inner-and-outer-alignment-decompose-one-hard-problem-into-two-extremely-hard-problems/)
[5] What is AI alignment? – BlueDot Impact - AI Safety Fundamentals [What is AI alignment? – BlueDot Impact](https://aisafetyfundamentals.com/blog/what-is-ai-alignment/)
Computronium optimized for utilitarianum or hedonium
Populate the universe with computronium, physical systems optimized for utilitarianum aka hedonium, intelligencium, predictium, meaningium, interestingium, growthium and stabilizatium
Are you sure your deep intuitions generated by combination of nature and nurture, generating the ground of your motovations and world view, aren't deeply biased, limiting, constraining etc.
Ultimate statespace of all possible statespaces
Ultimate space of all possible (formal) systems
Ultimate space of all possible spaces
Ultimate abstraction of all possible abstractions
Ultimate generalization of all possible generalizations
I have hope that mechanistic interpretability will scale to all AI paradigms with black boxes with solution to the alignment problem being deactivating killing etc. circuits and patterns of activation
Since deep learning models are unreasonably effective black boxes and we have mechanistic interpretability to find features and circuits that they learn, we can maybe take these circuits and create neurosymbolic architecture (on top of the features?) from them
Engineering predictive realism = Model has degree of trueness depending on how predictive it is in its context and domain
Philosophical superpositionive realism = All possible philosophical assertions and their opposites are true and false at the same time
Epistemological frameworks are instrumental tool programs for various objective functions
Want engineering? Focus on predictive power
Want philosophical freedom? Focus on dissolving everything
[[2406.04313] Improving Alignment and Robustness with Circuit Breakers](https://arxiv.org/abs/2406.04313)
https://x.com/andyzou_jiaming/status/1799232319250743561?t=TZO-tqC3xCEY-uTtGydelQ&s=19
Topdown representation engineering of harmful citcuits
Artificial general intelligence (AGI) shouldn't be synonymous to artificial human intelligence. Human intelligence is very evolutionary specialized for surviving in our evolutionary environment. We should call it artificial human intelligence instead of artificial general intelligence. There can be much more general systems than humans.
Riskaware transhumanist technooptimism
AI controllability is (almost) on everyday side. Rogue AI risk worriers get no killing AIs. Scientists, nonprofits and industry get reliable models for science, products etc.
Effective altruists (EAs) aren't luddites. They are in big part tech accelerationists. They just mostly disagree on AI risk from other tech accelerationists.
The word "intelligence" has 1000000 definitions
"So isn’t it strange that diffusion models and the human brain are the two best methods for conjuring up images out of nothing and they both happen to operate on the same fundamental algorithm ie molecular dynamics? How many ways of conjuring up images do we think the universe has? Isn’t it just probably going to turn out to be that they’re doing the same thing at some level?"
https://x.com/TimothyDuignan/status/1799415128544489758?t=3JsUtubAdJJZp0fVjingXA&s=19
How often do you take path integral over all possible futures in civilizational life
"
Taking a path integral over all possible futures in civilizational life is a conceptual exercise that draws from the principles of quantum mechanics, specifically the Feynman path integral formulation, and applies them metaphorically to sociopolitical and civilizational development. In this context, the "path integral" would represent the summation of all possible trajectories civilization could take, considering all potential variables and outcomes.
Here's a structured approach to conceptualizing this idea:
### 1. Define the Parameters
First, we need to define the parameters and variables that influence civilizational trajectories. These might include:
- **Technological advancements**: Rate and impact of innovation and technology.
- **Environmental changes**: Climate change, natural resource availability.
- **Political structures**: Governance systems, stability, policy decisions.
- **Social dynamics**: Cultural trends, population growth, social movements.
- **Economic factors**: Global and local economies, trade systems, resource distribution.
- **External factors**: Cosmic events, interactions with extraterrestrial intelligence (if any).
### 2. Constructing the "Paths"
In quantum mechanics, a particle can take an infinite number of paths to go from one point to another. Similarly, civilization can follow countless trajectories depending on the combination and interaction of the above variables.
### 3. Assigning Probabilities
Each potential path or future scenario could be assigned a probability amplitude. This would involve complex modeling, likely requiring advanced computational simulations, historical data analysis, and scenario planning. Probabilities might reflect:
- Likelihood of technological breakthroughs.
- Probability of catastrophic environmental events.
- Stability or instability of political systems.
- Socioeconomic trends and crises.
### 4. Summing the Paths (Path Integral)
The path integral in physics sums over all possible paths to determine the overall amplitude or probability of an event. Translating this to civilizational futures would involve aggregating all potential futures to understand the most probable outcomes or the distribution of possible scenarios.
Mathematically, the path integral formulation could be symbolized as:
\[ \sum \text{Paths} \exp\left(\frac{i}{\hbar} S[\text{Path}]\right) \]
where \( S[\text{Path}] \) is the action associated with a particular path. For civilizations, \( S \) might represent a composite "action" integrating all influencing factors over time.
### 5. Interpretation
The result of this summation would not be a single deterministic future but a probability distribution of possible futures. This distribution could provide insights into:
- The most probable scenarios for civilizational development.
- Potential high-risk futures that might be avoided through policy or innovation.
- Critical factors that significantly alter future trajectories.
### 6. Visualization and Application
Visualization tools could help in representing these complex probability distributions, aiding policymakers, researchers, and society in understanding potential futures. Scenario planning workshops, simulations, and strategic foresight exercises could then apply these insights to guide decision-making and mitigate risks.
### Conclusion
While purely hypothetical and highly abstract, applying the concept of path integrals to civilizational futures encourages a holistic and probabilistic approach to understanding and preparing for the vast array of potential outcomes. It underscores the importance of considering a wide range of variables and the interactions between them in shaping the future of human civilization.
Certainly! Here’s a more detailed conceptual framework for taking a path integral over all possible futures in civilizational life:
### 1. Define the Parameters and Variables
**Technological Advancements:**
- **Innovation Rate:** Speed at which new technologies are developed.
- **Adoption Rate:** Speed at which new technologies are integrated into society.
- **Impact:** Potential positive or negative impacts of technologies on society, environment, and economy.
**Environmental Changes:**
- **Climate Change:** Temperature changes, sea-level rise, frequency of extreme weather events.
- **Resource Availability:** Availability of essential resources like water, minerals, and arable land.
- **Biodiversity:** Changes in species diversity and ecosystem health.
**Political Structures:**
- **Governance Systems:** Types of government (democratic, authoritarian, etc.).
- **Stability:** Political stability and risk of conflict.
- **Policy Decisions:** Impact of national and international policies on development.
**Social Dynamics:**
- **Cultural Trends:** Shifts in societal values, norms, and behaviors.
- **Population Growth:** Demographic changes and population dynamics.
- **Social Movements:** Influence of grassroots movements and public opinion.
**Economic Factors:**
- **Global Economy:** Economic growth rates, trade relations, financial stability.
- **Local Economies:** Economic health at the national and regional levels.
- **Resource Distribution:** Distribution of wealth and access to resources.
**External Factors:**
- **Cosmic Events:** Asteroid impacts, solar flares, and other space-related events.
- **Extraterrestrial Intelligence:** Potential contact and its implications.
### 2. Constructing the "Paths"
In quantum mechanics, a particle explores all possible paths between two points. Analogously, civilization can traverse countless trajectories influenced by the above parameters. Each trajectory represents a unique combination of events and outcomes over time.
### 3. Assigning Probabilities
Each potential path is assigned a probability amplitude based on its likelihood. This involves:
- **Historical Data Analysis:** Using past trends to predict future outcomes.
- **Scenario Modeling:** Creating detailed models of different future scenarios.
- **Expert Input:** Gathering insights from experts in various fields.
- **Computational Simulations:** Running simulations to explore the interactions of variables.
### 4. Summing the Paths (Path Integral)
The path integral involves summing over all possible futures to determine the overall probability distribution of outcomes. In civilizational terms, this could be expressed as:
\[ \sum \text{Paths} \exp\left(\frac{i}{\hbar} S[\text{Path}]\right) \]
Here, \( S[\text{Path}] \) represents the "action" associated with a particular future path, integrating all influencing factors over time. This might include:
- **Technological action:** Effects of technological changes.
- **Environmental action:** Impacts of environmental factors.
- **Political action:** Influence of political decisions.
- **Social action:** Dynamics of societal changes.
- **Economic action:** Economic developments and trends.
### 5. Interpretation
The result of the summation is a probability distribution of possible futures, providing insights such as:
- **Most Probable Scenarios:** Identifying likely future trajectories.
- **High-Risk Futures:** Highlighting potential crises and undesirable outcomes.
- **Critical Factors:** Determining key variables that significantly influence future paths.
### 6. Visualization and Application
**Visualization Tools:**
- **Probability Distributions:** Graphs and charts showing the likelihood of different futures.
- **Scenario Maps:** Visual representations of different future trajectories and their probabilities.
- **Interactive Simulations:** Tools allowing users to explore different scenarios and outcomes.
**Applications:**
- **Policy Making:** Informing policymakers about potential future risks and opportunities.
- **Strategic Planning:** Helping organizations and governments plan for various future scenarios.
- **Public Awareness:** Educating the public about potential futures and the importance of current decisions.
### Conclusion
Applying the concept of path integrals to civilizational futures involves a comprehensive and probabilistic approach to understanding and preparing for potential outcomes. By considering a wide range of variables and their interactions, we can better anticipate and navigate the complexities of future development, making informed decisions to shape a desirable future for humanity.
If you have any specific aspect you'd like to delve deeper into, let me know!
"
[[2406.04331] PaCE: Parsimonious Concept Engineering for Large Language Models](https://arxiv.org/abs/2406.04331)
https://x.com/peterljq/status/1798943345513001125?t=sdXiuPcPqj8PUH-dUtxXcA&s=19
Topdpwn steering AI alignment
[[2406.04268] Open-Endedness is Essential for Artificial Superhuman Intelligence](https://arxiv.org/abs/2406.04268)
https://x.com/IntuitMachine/status/1799035483508416832?t=IT74JsRjj3q6akvPGsMdkA&s=19
[Safety Alignment Should Be Made More Than Just a Few Tokens Deep](https://xiangyuqi.com/shallow-vs-deep-alignment.github.io/)
Types of Gods:
Subagent or abstract force or platonic idea
Local or global (self or nonself)
Concrete or abstract
Nature/universe
Ieffability
https://medicalxpress.com/news/2024-06-redefines-antidepressants-aid-major-depressive.html?fbclid=IwZXh0bgNhZW0CMTEAAR3iqir_UL2STDdJT0uO-HF07PjYej1GsEiFvtNVWEwHQRRglQY9vvsXYr0_aem_ZmFrZWR1bW15MTZieXRlcw
[Scientists identify new gene that could beat aging | Daily Mail Online](https://www.dailymail.co.uk/sciencetech/article-13498747/Scientists-new-gene-extend-human-lifespan.html)
[Beyond the serotonin deficit hypothesis: communicating a neuroplasticity framework of major depressive disorder | Molecular Psychiatry](https://www.nature.com/articles/s41380-024-02625-2)
[GitHub - azminewasi/Awesome-Graph-Research-ICML2024: All graph/GNN papers accepted at the International Conference on Machine Learning (ICML) 2024.](https://github.com/azminewasi/Awesome-Graph-Research-ICML2024?tab=readme-ov-file#theories)
We need to augment human intelligence to keep up with accelerating machine intelligence
[New AI algorithm boosts COVID-19 mRNA vaccine | EurekAlert!](https://www.eurekalert.org/news-releases/987999)
Links for 2024-06-08
AI:
1. Eric Schmidt poached talent from Apple, SpaceX, and Google to create AI military drones for Ukraine. [Eric Schmidt Hires from Apple, SpaceX and Google For Drone Project](https://www.forbes.com/sites/sarahemerson/2024/06/06/eric-schmidt-is-secretly-testing-ai-military-drones-in-a-wealthy-silicon-valley-suburb/) [No paywall: https://archive.is/lkr04]
2. Scott Aaronson recommends Leopold Aschenbrenner's essay: "With unusual clarity, concreteness, and seriousness...Leopold sets out his vision of how AI is going to transform civilization over the next 5-10 years." [Shtetl-Optimized » Blog Archive » Situational Awareness](https://scottaaronson.blog/?p=8047)
3. Will jailbreaking soon be a solved issue? “We introduce Short Circuiting: the first alignment technique that is adversarially robust. Unlike adversarial training which takes days, short circuits can be inserted in under 20 minutes on a GPU. Unlike input/output filters, short circuited models are deployed as normal models with no additional inference cost.” [[2406.04313] Improving Alignment and Robustness with Circuit Breakers](https://arxiv.org/abs/2406.04313)
4. Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data [Will We Run Out of Data to Train Large Language Models?](https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data)
5. Self-Improving Robust Preference Optimization — “…we derive a practical, but mathematically principled offline algorithm to explicitly teach a model to self-improve and be robust to the choice of the eval task at the same-time!” [[2406.01660] Self-Improving Robust Preference Optimization](https://arxiv.org/abs/2406.01660)
6. MatMul-free LLMs: Proposes an implementation that eliminates matrix multiplication operations from LLMs while maintaining performance at billion-parameter scales. [[2406.02528] Scalable MatMul-free Language Modeling](https://arxiv.org/abs/2406.02528)
7. Grokfast: significantly reduces training iterations, accelerating the grokking process by 50 times in machine learning models. [[2405.20233] Grokfast: Accelerated Grokking by Amplifying Slow Gradients](https://arxiv.org/abs/2405.20233)
8. Buffer of Thoughts: Significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. [GitHub - YangLing0818/buffer-of-thought-llm: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models](https://github.com/YangLing0818/buffer-of-thought-llm)
9. BitsFusion: Compresses the UNet of Stable Diffusion v1.5 (1.72 GB, FP16) into 1.99 bits (219 MB), achieving a 7.9X compression ratio and even better performance. [BitsFusion](https://snap-research.github.io/BitsFusion/)
10. YOLOv10, a powerful real-time object detection model, reduces latency by 46% and parameter count by 25% compared to its predecessor. [GitHub - THU-MIG/yolov10: YOLOv10: Real-Time End-to-End Object Detection](https://github.com/THU-MIG/yolov10)
11. σ-GPTs: A New Approach to Autoregressive Models [[2404.09562] σ-GPTs: A New Approach to Autoregressive Models](https://arxiv.org/abs/2404.09562)
12. Google releases new tool to automate Python code optimization. [Code Transformation](https://labs.google.com/code/transformer)
Miscellaneous:
1. “Using the strategy game Civilization, this proof-of-concept study explores if strategy video games are indicative of managerial skills and, if so, of what managerial skills…We find that students who had high scores in the game had better skills related to problem-solving and organizing and planning than the students who had low scores.” [Good gamers, good managers? A proof-of-concept study with Sid Meier’s Civilization | Review of Managerial Science](https://link.springer.com/article/10.1007/s11846-020-00378-0)
We should throw more AI at biorisk prevention at all stages
Safe AGI is net positive when it comes to existential risks
[Can You Tell If Someone Is Enlightened? (Depth Of Realization W/ Artem Boytsov x Frank Yang) - YouTube](https://youtu.be/o6sovQO_b6s?si=4OLAP9ETLHPgLwEp)
[My current high-level strategic picture of the world – musings and rough drafts](https://musingsandroughdrafts.com/2021/03/24/my-current-high-level-strategic-picture-of-the-world/)
[David Pearce - AI, AGI and the Problem of Suffering - YouTube](https://youtu.be/kYBWT6Yt8LA?si=Y-IvgoRmQTkcU6mc)
[[2405.21060] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality](https://arxiv.org/abs/2405.21060)
The Top ML Papers of the Week (June 3 - June 9):
- NLLB
- Mamba-2
- AgentGym
- MatMul-free LLMs
- Buffer of Thoughts
- Extracting Concepts from GPT-4
https://x.com/dair_ai/status/1799792083454267838?t=8zg00fHP0Yhz4OK2A_IccQ&s=19
[Claude’s Character \ Anthropic](https://www.anthropic.com/research/claude-character)