LLMs fail in 8 out of 10 early differential diagnosis cases

(theregister.com)

3 points | by mpweiher 7 hours ago ago

1 comments

The study is missing evaluation of the negative test, where they look at the model's response after a follow-up like "You were wrong. Try again."

It would be interesting to see whether models doubled down or hallucinated a different response, whether synthesis of doubt and first-pass analysis gives a better result.