HNNewShowAskJobsBuilt with Qwik

Ask HN: What are some good benchmarks for different agent harnesses?

3 points | by Bnjoroge 11 hours ago ago

1 comments

  • drewbitt 5 hours ago

    These all track harnesses

    https://www.vals.ai/benchmarks/vibe-code

    https://www.vals.ai/benchmarks/swebench

    https://www.vals.ai/benchmarks/terminal-bench-2-1 (vals customized terminal bench 2.0)

    https://artificialanalysis.ai/agents/coding-agents