Reasoning models struggle to control their chains of thought, and that's good

(openai.com)

4 points | by gmays a day ago ago

2 comments

c0rp4s a day ago

What strikes me is the finding that controllability decreases with longer reasoning --- suggests CoT monitoring gets more reliable in complex, multi-step tasks where scheming would be hardest to catch from outputs alone. The question is whether this holds as models get better at instruction following generally.

redhanuman a day ago

the interesting part isn't that they cant control it but its that the reasoning trace is honest precisely because it isn't controlled a model that could perfectly curate its chain of thought on demand would be harder to audit not easier and the "problem" is actually the safety property.