Good question. We don’t define “wrong tool” in some universal way, because that really depends on the workflow.
What we do in practice is let the team mark a few tool calls as right or wrong in context, then use that to learn the pattern for that agent. From there, we can flag similar cases automatically by looking at the convo state, the tool chosen, the arguments, and what happened next.
So we’re learning what “correct” looks like for your workflow and then catching repeats of the same kind of mistake.
I know your homepage isn't your business, but I'm bet Claude could fix the janky horizontal overflow on mobile in a prompt. Makes for a very distracting read
How do you identify "wrong tool" invocations (how is the "wrong tool" defined)?
Good question. We don’t define “wrong tool” in some universal way, because that really depends on the workflow.
What we do in practice is let the team mark a few tool calls as right or wrong in context, then use that to learn the pattern for that agent. From there, we can flag similar cases automatically by looking at the convo state, the tool chosen, the arguments, and what happened next.
So we’re learning what “correct” looks like for your workflow and then catching repeats of the same kind of mistake.
I know your homepage isn't your business, but I'm bet Claude could fix the janky horizontal overflow on mobile in a prompt. Makes for a very distracting read
Will fix ASAP.
Agreed - fix fast. No way to take a tool seriously about taking care of production that has such a blatant production issue