Why the simplest desktop agent abstraction wins

(bytebot.ai)

1 points | by atupem 15 hours ago ago

2 comments

What are the biggest issues that the agent faces at the moment? I still find these general purpose agents frustrating to use at times because people position it as if it could do anything and then when you give it a reasonably complex task it breaks down.

I guess if someone figured out way to minimize the impact of an error, like a way for it to gracefully handle it without it feeling like too much work, that would fix most of the problems.

[-]

atupem 13 hours ago

Lots of interesting issues:

- The agent has a tool to set it's task to 'completed', 'failed', or 'needs_help', with the last one being a option for human in the loop scenarios. Sometimes the agent gets lazy and says it needs help prematurely.

- Additionally, the agent can create subtasks for itself, either to run immediately, or to schedule in the future. Here it again can call that tool a bit too eagerly, filling duplicate subtasks for a task that involves repetitive work.

- Properly handling super long running tasks, that run for 1+ hours. The context window eventually hits it's limit (this will be addressed this week)

Aside from those top of mind issues, there's a whole bunch of scaffolding issues - filesystem permissions, prompt injection security, i/o support, token cost - lot's to improve!

We're still super early, but already these agents are showing flashes of brilliance, and we're gaining more and more conviction that this is the right form factor