Evaluating Long-Context Question and Answer Systems

(eugeneyan.com)

12 points | by swyx 4 days ago ago

1 comments

Seems AI generated, if not, nothing new here. Post regurgitates info known for long time and misses largest issues of nuance of “LLM-as-a-judge” as if written in 2023 and audience is living under rock (why?):

>> This is where LLM-evaluators (also called “LLM-as-Judge”) can help