Shard theory

## Tags - Part of: [[Psychology]] [[Reinforcement learning]] [[AI safety]] [[Mathematical theory of artificial intelligence]] - Related: - Includes: - Additional: ## Definitions - Research program aimed at explaining the systematic relationships between the reinforcement schedules and learned values of reinforcement-learning agents ## Main resources - [Shard Theory: An Overview — LessWrong](https://www.lesswrong.com/posts/xqkGmfikqapbJ2YMj/shard-theory-an-overview) - <iframe src="https://www.lesswrong.com/posts/xqkGmfikqapbJ2YMj/shard-theory-an-overview" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe> ## Landscapes - [Shard Theory - LessWrong](https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX) ## Contents - [Understanding and controlling a maze-solving policy network — LessWrong](https://www.lesswrong.com/posts/cAC4AXiNC5ig6jQnc/understanding-and-controlling-a-maze-solving-policy-network) - [Steering GPT-2-XL by adding an activation vector — LessWrong](https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector)