2 points | by riverdroid 10 hours ago ago
2 comments
They told it to try _everything_...
I had the same reaction at first, then noticed that they discuss this: the reason why they told that is because it is standard system prompt injected by most coding agent harnesses like Cursor and all, so it seems like a fair test setup.
They told it to try _everything_...
I had the same reaction at first, then noticed that they discuss this: the reason why they told that is because it is standard system prompt injected by most coding agent harnesses like Cursor and all, so it seems like a fair test setup.