## Tags - Part of: [[Machine learning]] [[Artificial Intelligence]] [[AI safety]] [[Capability]] - Related: - Includes: - Additional: ## Definitions - Train reward models to provide feedback for each individual step in a [[Chain of thought]]. ## Main resources - [Improving mathematical reasoning with process supervision](https://openai.com/research/improving-mathematical-reasoning-with-process-supervision) - <iframe src="https://openai.com/research/improving-mathematical-reasoning-with-process-supervision" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe>