Launch HN: Voker (YC S24) – Analytics for AI Agents

(voker.ai)

20 points | by ttpost an hour ago ago

8 comments

Damianf19 28 minutes ago

What's the data model that lets you compare agents that differ a lot in tools/policies? Curious if you normalize on the "what did the user actually accomplish" layer or on raw token/turn metrics, because the two paint completely different pictures of "is this agent working." We struggle with this on the eval side of our own product (email pipeline outcomes, not agents, but same shape).

[-]

alrudolph 2 minutes ago

For the agent working, we're focusing on the user outcome, we think that the raw usage, number of turns, function calls are useful operationally but think of those as more observability than the core evaluation target. We do show some of these stats in our conversation view but don't aggregate to compare agents. Longer term we will look to add in more of these features so we can compare quality vs cost, for example

akslp2080 an hour ago

How is it different than Langfuse? sorry if I am off the track but Langfuse also provides some detailed tracing of agentic behavior and decisions.

[-]

ttpost 40 minutes ago

We get this question a lot! We work hand-in-hand with obs tools like Langfuse. Langfuse is great for debugging technical issues on individual traces like timing conditions that resulted in failed API calls.

Voker focuses on product, business and user outcomes - like what intents did the user bring to your agent that you might not expect. We're built for the whole product team, whereas Langfuse focuses on engineers specifically.

One way to think about it would be: a PM notices in Voker that a new intent category is coming up frequently and the agent isn't handling it well. The PM can dig into the data with visualizations or our conversation reconstructions. Once they confirm its a real issue worth addressing, they can link their investigation to the AI engineer - who can use Voker AND Langfuse to debug and implement a fix/improvement.

Ozzie_osman 42 minutes ago

If the team is here, would love to understand how it compares to something like Amplitude's agent analytics (https://amplitude.com/ai-agents).

[-]

ttpost 36 minutes ago

Yeah, this is a confusing one on wording. TLDR: Amplitude is analytics for your web/product data, Voker is analytics for your agent data.

We call Amplitude's feature an "AI Analyst". Essentially Amplitude is layering a LLM copilot on top of their own product - so you don't have to click the buttons or write reports to get insights.

We're an analytics platform built for tracking your agents. Different products with different problems they're solving.

Not sure if this helps, but essentially Amplitude could use Voker to track how well their AI Analyst agent product is actually working!

ggamecrazy 27 minutes ago

> High interaction volume (1k+ chat sessions per month)

I don't mean to be that typical HN commenter but you did lose me a bit there.

I know a lot of people are just getting started with agents but even for a lot of scrappy startups usage is a lot higher than that!

If I may suggest focusing on explaining how you can add value even when usage is super low to controlling costs even when usage can get super high?

I can validate you that it is a true problem that's solved by large companies but you have to hand-roll yourself @ startups (via airflow or queues, etc). But unfortunately one where I am not sure that a lot of stakeholders understand the benefits of (yet!). I think value has to be shown a bit more clearly here, sadly.

Congrats on the launch!

[-]

ttpost 12 minutes ago

thanks for the honest insight! Very helpful to hear that you feel the problem is real, that's whats most important to us, we'll keep working on getting the solution and messaging right!

- we say >1K because two reasons: 1. its still feasible (although tedious) to put the full burden on analyzing agent performance and usage on your engineers equipped with obs tools and logs (not ideal but its what most AI teams we see do) 2. you're spot on - it actually surprised us too how few companies (even the really big public ones that have promised to build 200+ agents back in 2024) still have barely one or two agents in prod with only hundreds of convos. 1K convos was our best first guess at the cutoff point where the manual work of digging through traces, and the insights you can get start to make sense enough to need a tool for this. We're definitely planning to tune that number as time goes on!

-We definitely don't have pricing figured out yet, we plan to continue to iterate on the event volumes etc to make sure our product gives clear positive roi for the teams using it at every level. But in general we look at other analytics products as our early barometer. Yes the higher your event volume the more cost, but in my experience (back in Ecomm using Heap analytics) it was so incredibly worth it to pay more as our business and data volumes scaled (we ran on data) I think this is the challenge with all analytics products, they're not useful if you only have 5 site visitors. We see the progression as start with obs/logs -> evals -> analytics as your usage scales.

Curious to hear, whats the session volume of the agent products you run? Every datapoint helps us tune our tiers, pricing, and most importantly our product!