The lack of built-in retries is a huge pain, but the bigger risk is a "successful" retry that just outputs another hallucination.
How are you defining a "success" signal for these tasks? Is it just a 200 OK, or are you planning a fidelity audit for each item in the queue to trigger those retries?
You can handle this in a few ways depending on the task. Even. adding to the prompt "double check your answer before answering" - the agent will take another turn to double check its work. You can also do this with a fresh task/prompt.
Ideally, if you are able to use code to validate (either with a test or eval) that works best.
The lack of built-in retries is a huge pain, but the bigger risk is a "successful" retry that just outputs another hallucination.
How are you defining a "success" signal for these tasks? Is it just a 200 OK, or are you planning a fidelity audit for each item in the queue to trigger those retries?
You can handle this in a few ways depending on the task. Even. adding to the prompt "double check your answer before answering" - the agent will take another turn to double check its work. You can also do this with a fresh task/prompt.
Ideally, if you are able to use code to validate (either with a test or eval) that works best.