13 comments

  • jlukecarlson an hour ago

    I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!

  • mlop99 6 hours ago

    Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?

  • 4 hours ago
    [deleted]
  • 4 hours ago
    [deleted]
  • shailendra145 6 hours ago

    A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.

  • papz2k 6 hours ago

    Very interesting work.

  • raj_maddipati 4 hours ago

    Excellent work

  • harshv_03 4 hours ago

    Interesting

  • ankush9812 6 hours ago

    Nice Work

  • ashyash518 7 hours ago

    Nice work

  • saurabh_xen 7 hours ago

    Great work

  • quanta9 7 hours ago

    interesting

  • cs_exps 5 hours ago

    [dead]