Ask HN: How do companies like OpenAI, Perplexity fine tune rich output?

4 points | by agaase19 12 hours ago ago

2 comments

pizza 12 hours ago

Anything with a linter means, at minimum, free verifiable rewards for RL (though whether something parses versus looks good is another story). That, plus, they have more data than anyone, and also it seems somewhat reasonable that stronger models could learn 'more' from a given instance or set of examples.

[-]

agaase19 10 hours ago

Can you elaborate on "linter means and verifiable rewards for RL"? Is this something others would find extremely difficult to do ?