Best Practices for Building Agentic AI Systems

(userjot.com)

142 points | by vinhnx 10 hours ago ago

63 comments

Should anyone be interested I am working on AgentUp:

I believe its quite unique as far as agents go. Runtime is config driven, so you can get caching, state management, security (oauth2, jwt) , Tools and MCP capabilities are granted based on scope allocation (file:read, api:write, map:generate) etc, retry handlers, push notifications / webhooks (for long running tasks). This means you can get a fully baked agent built in minutes, using just the CLI.

From there, when you need to customize and write you own business logic, you can code whatever you want and inherit all of AgentUps middleware, and have it as a plugin to the core. This is pretty neat, as it means you're plugins can be pinned as dependencies.

Plugin Example: https://github.com/RedDotRocket/AgentUp-systools

You then end up with a portable agent, where anyone can clone the repo, `agentup run` and like docker, it pulls in all it needs and is serving.

Its currently aligned with the A2A specification, so it will talk to Pydantic, Langchain and Google Agent SDK developed agents.

Its early days, but getting traction.

Previous to this , I created sigstore and have been building OpenSource for many years.

The docs also give a good overview: https://docs.agentup.dev

imsh4yy 7 hours ago

Author of this post here.

For context, I'm a solo developer building UserJot. I've been recently looking deeper into integrating AI into the product but I've been wanting to go a lot deeper than just wrapping a single API call and calling it a day.

So this blog post is mostly my experience trying to reverse engineer other AI agents and experimenting with different approaches for a bit.

Happy to answer any questions.

[-]

itsalotoffun 7 hours ago

When you discuss caching, are you talking about caching the LLM response on your side (what I presume) or actual prompt caching (using the provider cache[0])? Curious why you'd invalidate static content?

[0]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...

[-]

imsh4yy 7 hours ago

I think I need to make this a bit more clear. I was mostly referring to caching the tools (sub-agents) if they are a pure function. But that may be a bit too speicific for the sake of this post.

i.e. you have a query that reads data that doesn't change often, so you can cache the result.

[-]

adastra22 6 hours ago

It seems very doubtful to me that every query would be literally the same (e.g. same hash), if these are plain text descriptions of the subset task.

[-]

cma an hour ago

The task can be something like summarize each source file. Many files might not change every time.

imsh4yy 6 hours ago

I mean that depends on how you define the "input" for the tool. Some can be very deterministic like an enum, boolean, number, etc.

solidasparagus 7 hours ago

Nice post! Can you share a bit more about what variety of tasks you've used agents for? Agents can mean so many different things depending on who you're talking to. A lot of the examples seem like read-only/analysis tasks. Did you also work on tasks where agent took actions and changed state? If yes, did you find any differences in the patterns that worked for those agents?

[-]

imsh4yy 7 hours ago

Sure! So there are both read-only and write-only agents that I'm working on. Basically there's a main agent (main LLM) that is responsible for the overall flow (currently testing GPT-5 Mini for this) and then there are the sub-agents, like I mentioned, that are defined as tools.

Hopefully this isn't against the terms here, but I posted a screenshot here of how I'm trying to build this into the changelog editor to allow users to basically go:

https://x.com/ImSh4yy/status/1951012330487079342

1. What tickets did we recently close? 2. Nice, write a changelog entry for that. 3. Add me as author, tags, and title. 4. Schedule this changelog for monday morning.

Of course, this sounds very trivial on the surface, but it starts to get more complex when you think about how to do find and replace in the text, how to fetch tickets and analyze them, how to write the changelog entry, etc.

Hope this helps.

[-]

AndyNemmity 6 hours ago

Neat idea!

[-]

imsh4yy 6 hours ago

Thank you :)

itsalotoffun 7 hours ago

Also, regarding your agents (primary and sub):

- Did you build your own or are you farming out to say Opencode? - If you built your own, did you roll from scratch or use a framework? Any comments either way on this? - How "agentic" (or constrained as the case may be) are your agents in terms of the tools you've provided them?

[-]

imsh4yy 6 hours ago

Not sure if I understand the question, but I'll do my best to answer.

I guess Agents/Agentic are too broad of a term. All of this is really an LLM that has a set of tools that may or may not be other LLMs. You don't really need a framework as long as you can make HTTP calls to openrouter or some other provider and handle tool calling.

I'm using the AI sdk as it plays very nicely with TypeScript and gives you a lot of interesting features like handling server-side/client-side tool calling and synchronization.

My current setup has a mix of tools, some of which are pure functions (i.e. database queries), some of which handle server-side mutations (i.e. scheduling a changelog), and some of which are supposed to run locally on the client (i.e. updating TipTap editor).

Again, hopefully this somewhat answers the question, but happy to provide more details if needed.

pamelafox 7 hours ago

When you describe subagents, are those single-tool agents, or are they multi-tool agents with their own ability to reflect and iterate? (i.e. how many actual LLM calls does a subagent make?)

[-]

imsh4yy 7 hours ago

So I have a main agent that is responsible for streering the overall flow, and then there are the sub-agents that, as I mentioned, are stateless functions that are called by the main agent.

Now these could be anything really: API calls, pure computation, or even LLM calls.

energy123 6 hours ago

How tightly scaffolded/harnessed/constrained is your primary agent for a given task? Are you telling it what reasoning strategy to use?

sgt101 5 hours ago

Could you produce some evidence that what you are advising is actually useful?

JSR_FDED 8 hours ago

My favorite post in a long time. Super straightforward, confirms my own early experiences but the author has gone further than I have and I can already see how his hard-won insight is going to save me time and money. One change I’m going to make immediately is to use cheaper/faster/simpler models for 3/4 of my tasks. This will also set things up nicely for having some tasks run on local models in the future.

mrishabh09 29 minutes ago

I have been following the growth of agentic AI. While experimenting, I registered a few domains that I am not going to use: - pyagents.com - agentkafe.com

I have listed them on Afternic, but open to suggestions/feedback on whether such names are even useful for projects.

adastra22 6 hours ago

I recently posted here how I’m seeing success with sub agent-based autonomous dev (not “vibe coding” as I actually review every line before I commit, but the same general idea). Different application, but I can confirm every one of the best practices described in this article, as I came to the same conclusions myself.

https://news.ycombinator.com/item?id=44893025

bashtoni 6 hours ago

Am I the only one who cannot stand this terrible AI generated writing style?

These awful three sentence abominations:

"Each subagent runs in complete isolation. The primary agent handles all the orchestration. Simple." "No conversation. No “remember what we talked about.” Just task in, result out." "No ambiguity. No interpretation. Just data."

AI is good at some things, but copywriting certainly isn't one of them. Whatever the author put into the model to get this output would have been better than what the AI shat out.

[-]

itsalotoffun 5 hours ago

I'm genuinely curious, is it: a) the writing style you can't stand, b) the fact that this piece tripped your "this is written by AI" and it's AI-written stuff you can't stand? And what the % split between the two is.

(I find there's a growing push-back against being fed AI-anything, so when this is suspected it seems like it generates outsized reactions)

[-]

JimDabell 3 hours ago

I’m pro-AI in general, but I hate the AI writing style that has gotten especially bad lately. It’s down to two things, neither of which are anti-AI sentiment.

Firstly, I find the tone of voice immensely irritating. It sounds like a mixture of LinkedIn broetry, a TEDx talk, and marketing speak. That’s irritating when a human does it, but it’s especially bad when AI applies it in cases where it’s jarringly wrong for the topic at hand.

I recently saw this example:

> This isn’t just “nicer syntax” — it’s a fundamental shift in how your software thinks.

— https://news.ycombinator.com/item?id=44873145

It was talking about datetime representation in software development but it has the tone of voice of somebody earnestly gesticulating on stage while explaining how they are going to solve world hunger. This is like the uncanny valley except instead of it making me uneasy it just pisses me off.

Secondly, it’s so incredibly overused. You’ll see “it’s not X—it’s Y” three times in three consecutive paragraphs. It’s irritating the first time, so when I see it throughout a whole article, I get an exceptionally low opinion of whoever published it.

Having said that, this article wasn’t particularly bad.

bashtoni 5 hours ago

The saccharin writing style would be bad in isolation, but bearable. The overexposure to it is what leads me to dislike it so much I think.

The fact is written by AI does add a layer of frustration because you know someone wrote something more human and more real, but all you get to see is what the model made of it after digestion.

nojs 6 hours ago

"Subagent orchestration" is also a really quick win in Claude. You can just say "spawn a subagent to do each task in X, give it Y context".

This lets you a) run things in parallel if you want, but also b) keep the main agent's context clean, and therefore run much larger, longer running tasks without the "race against the context clock" issue.

[-]

imsh4yy 6 hours ago

I assume you're talking about Claude Code, right? If so, I very much agree with this. A lot of this was actually inspired by how easy it was to do in Claude Code.

I first experimented with allowing the main agent have a "conversation" with sub-agents. For example, I created a database of messages between the main agent and the sub-agents, and allowed both append to it. This kinda worked for a few messages but kept getting stuck on mid-tier models, such as GPT-5 mini.

But from my understanding, their implementation is also similar to the stateless functions I described. (happy to be proven wrong). Sub agents don't communicate back much aside from the final result, and they don't have a conversation history.

The live updates you see are mostly the application layer updating the UI which initially confused me.

[-]

AndyNemmity 6 hours ago

Love how you experimented, you are a creative thinker.

[-]

imsh4yy 6 hours ago

Haha, thank you! I just like to build stuff.

adastra22 6 hours ago

I am doing similar experimentation with Claude Code. I believe you are correct. The primary agent only sees the generated report, nothing more.

kami23 8 hours ago

These are the same categories of coordination I've been looking at all day, trying to find the sweet spot in how complex the orchestration can be. I tried to add some context where the agents got it into a cycle of editing the code that another was testing and stuff like that.

I know my next step should be to give the agents a good git pattern but I'm having so much fun just finding the team organization that delivers better stuff. I have them consult with each other in tech choices and have picked what I would have picked

The consensus protocol for choices is one I really liked, and that will maybe do more self correction.

Ive been asking them to illustrate their flow of work and asking for decisions, I need to go back and see if that's the case. Probably would be made easier if I get my git experiment flow down.

The future is tooling for these. If we can team up enough that we get consensus approaching something 'safe' the tools we can give them to run in dry/live mode and have a human validate the process for a time and then once you have enough feedback move into the next thing needing fixing.

I have a lot of apps with cobra cli tooling that resides next to the server code. Being able to pump our docs into an mcp server for tool execution is giving me so many ideas.

gregorriegler 5 hours ago

Love it.

The recovery part was inspiring.

I've written a similar thing - a Pattern Language sort of - and I call the primary Agent "Orchestrator".

Actually, I might steal the Consensus and MapReduce patterns

https://gregorriegler.com/2025/07/12/augmented-coding-patter...

[-]

sgt101 5 hours ago

Like the OP article, no data, no evidence - random ideas presented as advice.

I mean, could you even do some anec-data style before and after comparisons? Like, do 5 problems with no explicit request for tests and 5 without and show that you get 0 out of 5 for no test and 5 out of 5 for a test?

People like me might carp on about how that's not good enough or something, but to be honest I would feel quite uneasy and unjustified complaining unless I had done similar work... without this sort of attempt to justify your thoughts though I feel like a school teacher intervening in a discussion where some kid is advocating the use of heroin to her peers on the basis that Aunt Agnes says it feels good.

tempusalaria 3 hours ago

I understand that calling it ‘agentic’ is nice for marketing, but most of what is described in this blog post is not related to agents. The design patterns you describe are explicitly non-agentic. Many of the use cases described are better handled by a single LLM call rather than an agent.

Finally, saying that agents can have predictable behavior is wrong (except on simple tasks where you shouldn’t be using an agent anyway). Agents loop and compound their input, making them highly non-deterministic even for the same prompt.

dlivingston 6 hours ago

As someone totally outside of this space, how do I build an agent? Are there any languages or libraries that are sort of the de facto standard?

[-]

AndyNemmity 6 hours ago

trivial with claude code, it's an md file in the agents directory. It has a format, you follow it and have the ai create your first agent with your ideas on what an agent should do, and concern itself with.

adastra22 6 hours ago

You write in plain English text, and put it .claude/agents. That is all.

faangguyindia 3 hours ago

We do lots of fully automated code reviews on code of 5000+ companies in asia.

Let's say my "code reviewer" is main agent, and we are just looking for stuff, i still use "code reviewer" to make "rip grep" queries across thousands of repositories, then i've "rip grep" or "git" as simple functions in the main agent.

I doubt i'll benefit from making subagent for this.

AndyNemmity 6 hours ago

I agreed with much of this, but I started looking into the enterprise ai systems that large companies are making, and they use agent control via software.

So I tried it. It's much better.

Software is just better at handling discrete tasks that you understand, like mapping agent pathing. There's no point giving that to an AI to do.

The Cordinator "Main Agent" should just call the software to manage the agents.

It works really well in comparison.

You can have the software call claude code via command line, sending the prompt in. You have it create full detail logs of what it's doing, and done, and created.

Maybe I'll change my mind, everything is moving so fast, and we're all in the dark searching around for ideas, but so far it's been working pretty well.

You do lose in the middle visibility to stop it.

I also have it evaluating the outputs to determine how well the agents followed their instructions. That seems key to understanding if more context adds value when comparing agents.

[-]

imsh4yy 6 hours ago

Anyway I can learn more about this?

faangguyindia 3 hours ago

>The Context Explosion: Passing entire conversation history to every agent. Tokens aren’t free.

I often debated if to run sub agents with "little context" then i realized i can just cache the big prompt that goes with main agents and i get no benefit from running subagents with reduced context.

[-]

t0mas88 3 hours ago

Context engineering doesn't seem to be a "more is better" exercise, especially not with the more basic models. So it's not just about less tokens, the quality of the response also increases if you provide only the relevant context.

testycool 4 hours ago

Wow, this is exactly the type of write-up I needed today.

I'm working on a project where I've found that a few orchestrated agents may be the way to go, but I've been having some hiccups which I intended to deal with. But this article covers many of my concerns at a high level.

dsrtslnd23 6 hours ago

What do you guys use to actually implement this? I used AWS Lambda functions for what is called 'subagents' here and do the main orchestration via long running Step Functions. This is more structured but also allows me to use the common patterns in mentioned in the article (e.g. parallel vs serialized). I noticed however that it can get quite complex and am debating to just implement everything served as a single FastAPI app. I do want to keep it scalable though.

[-]

t0mas88 3 hours ago

A single app with Spring AI as the low level framework to make things like swapping models easy and to handle tool calls. Easier to add tests and to debug compared to Lambda and step functions.

jasonriddle 6 hours ago

When you say "same output" in

> Every subagent call should be like calling a pure function. Same input, same output. No shared memory. No conversation history. No state.

How are you setting temperature, top k, top p, etc?

[-]

imsh4yy 6 hours ago

So far I've been hardcoding these into the API calls.

[-]

jasonriddle 6 hours ago

Sure, but to clarify, so you are probably setting temperature to close to 0 in order to try to get as consistent output as possible based on the input? Have you made any changes to top k and/or top p that you have found makes agents output more consistent/deterministic?

[-]

imsh4yy 6 hours ago

Yes, temp is close to 0 for most models. For top k and top p, I've been using the default values set in OpenRouter.

canterburry 6 hours ago

"The “Smart Agent” Trap: I tried making agents that could “figure out” what to do. They couldn’t. Be explicit."

So what about this solution is actually agentic?

Overall, it sounds like you sat down and did a proper business process analysis and automated it.

Your subagents for sure have no autonomy and are just execution steps in a classic workflow except you happen to be calling an LLM.

Does the orchestrating agent adapt the process between invocations depending on the data and does it do so in any way more complex than a simple if then branch?

[-]

jondwillis 6 hours ago

Provide a tool schema that requires deep analysis to fill out correctly. Citations and scores for everything. Examples of high quality citations. Tools that fail or produce low quality results should return instructions about how to recover or interpret the result.

Have agents with different research tools try to corroborate and peer review output from competing agents. This is just one of many collaborative or competitive patterns you can model.

Yeah, it can get quite a bit more dynamic than an if statement if you apply some creativity and clarity and conviction.

imsh4yy 6 hours ago

You're right that this isn't the "autonomous agent" fantasy that keeps getting hyped.

The agentic part here is more modest but real. The primary agent does make runtime decisions about task decomposition based on the data and calls the subagents (tools) to do the actual work.

So yeah, it's closer to "intelligent workflow orchestration." That's probably a more honest description.

tempusalaria 3 hours ago

Yes this write-up is not about agents.

In fact it’s a great illustration of why the hype around agents is misplaced!

rkwz 7 hours ago

I found this post very helpful to getting started with agentic systems, what other posts do others recommend?

[-]

manojlds 7 hours ago

OG post - https://www.anthropic.com/engineering/building-effective-age...

patrickhogan1 6 hours ago

Do you believe that creating sub agents is a violation of the bitter lesson or is simply a way to add more context?

[-]

AndyNemmity 6 hours ago

Sub agents are about providing less context. Not more.

Sub agents are about providing targeted, specific information to the agents task, instead of having context around a billion other irrelevant topics.

The Database agent does not care at all about the instructions for your Agent Creator Agent. That is called negative context, or poison context, or whatever you want to call it.

So it's about targeted, specific, narrow instructions for a set of tasks.

adastra22 6 hours ago

What bitter lesson?

tlarkworthy 6 hours ago

These subagents look like tools

[-]

imsh4yy 6 hours ago

Yes they are tools.

itsalotoffun 8 hours ago

Super practical, no-bullshit write up clearly coming from the trenches. Worth the read.

[-]

Ros23 8 hours ago

"no-bullshit write up" about Agentic AI ... LOL

[-]

insin 6 hours ago

I've never seen so many different names at once for "LLM chat completion API call"

Der_Einzige 8 hours ago

Structured generation being the magic that makes agents good is true. The fact that the author puts this front and center implies to me that they actually do build working AI agents.