## Tags
- Part of: [[Machine learning]] [[Artificial Intelligence]] [[AI safety]] [[Capability]]
- Related:
- Includes:
- Additional:
## Definitions
- Train reward models to provide feedback for each individual step in a [[Chain of thought]].
## Main resources
- [Improving mathematical reasoning with process supervision](https://openai.com/research/improving-mathematical-reasoning-with-process-supervision)
- <iframe src="https://openai.com/research/improving-mathematical-reasoning-with-process-supervision" allow="fullscreen" allowfullscreen="" style="height:100%;width:100%; aspect-ratio: 16 / 5; "></iframe>