How to Make a Good Terminal Bench Task

(twitter.com)

3 points | by neversupervised 7 hours ago ago

2 comments

  • 7 hours ago
    [deleted]
  • neversupervised 7 hours ago

    I've been a contributor and reviewer for terminal bench since last August, and this post is about what I've learned designing and reviewing tasks. The guidance is broadly applicable to anyone building an agentic benchmark.I would love feedback from the HN community.