VibeBench: Measuring 1k Engineers' Opinions of New Models

(vibebench.standardagents.ai)

12 points | by jpschroeder 2 days ago ago

4 comments

  • mhi3 2 days ago

    "Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."

    Is this true?

    But I love this concept!

    • jpschroeder 2 days ago

      Oh very true. Benchmaxxing itself is basically gaming them.

  • ramon156 a day ago

    Love the idea!

    Page is incredibly slow on mobile, probably the avatars

  • memoryleakgame a day ago

    800 commits in a year...