2 comments

  • pizza 12 hours ago

    Anything with a linter means, at minimum, free verifiable rewards for RL (though whether something parses versus looks good is another story). That, plus, they have more data than anyone, and also it seems somewhat reasonable that stronger models could learn 'more' from a given instance or set of examples.

    • agaase19 10 hours ago

      Can you elaborate on "linter means and verifiable rewards for RL"? Is this something others would find extremely difficult to do ?