To add some context to our journey: before coming to HN, we launched on Product Hunt and spoke with over 100 companies to understand how they approach billing for AI and API-driven products. We’ve seen two main approaches:
1. *Homegrown billing stacks:* Many teams build their own systems in-house, but these quickly become a headache—especially as more products move to pure usage-based pricing or hybrid models with fixed fees. These systems are complex, require significant engineering resources to maintain, and ensuring accuracy is a constant challenge.
2. *Building on top of Stripe/Chargebee:* While these platforms are evolving to support usage-based billing alongside subscriptions, they still lack robust workflows for credits—which are becoming the new currency in usage-based models.
With Flexprice, our goal is to make it easy for developers to get started, integrate quickly, and address the less obvious pain points of usage metering and credits management. We’re focused on supporting all the pricing combinations we see emerging among AI-first companies, and on building an eventually consistent engine that can handle the realities of real-world usage data.
AI pricing models are evolving fast, and we’re eager to hear feedback from the HN community on where the pain points really are, so we can target them directly.
We've been building a GenAI application with fairly simple credit-based workflows to begin with. But just a month into launch, the workflows have started getting more complex, and we’re now rolling out 4 different pricing plans with volume-based tiers.
I was exploring how best to build a flexible credit billing system when I came across Flexprice — and I have to say, it's a great product.
Good to see you here on HN. Wishing you all the success!
Appreciate that, and glad Flexprice showed up at the right time. We’ve seen the same pattern: credit systems always look simple with one plan, but as soon as you introduce volume tiers, expirations, or promo logic, it turns into fragile code fast.
We’re building Flexprice specifically to avoid that constant rewrite cycle, good to know it resonates. Happy to chat if you run into edge cases while scaling those new plans.
Curious to hear how others have handled scale challenges in billing infrastructure:
If you're running usage-based billing for AI, infra, or API-heavy platforms—
How do you deal with high-throughput event ingestion (say, 10k+ events/sec) without dropping events or messing up customer metering?
We’ve seen setups struggle hard with:
Event ordering guarantees
Idempotency at scale
Handling retries without double-counting
Would love to hear what infra patterns, queues, or storage choices worked (or failed) for you—especially?
Our approach focuses on:
- Fire-and-forget ingestion with in-memory queues so events don’t block product requests
- Strict idempotency tokens tied to every event, enforced at the API layer
- Lightweight retry logic that prevents double-counting but guarantees delivery under transient failures
Storage-wise, we’ve leaned on a mix of time-series DBs for raw events and pre-aggregated summaries for billing views.
Would love to swap notes on failure patterns or queue setups if you’ve dealt with similar scale.
I run my startup for fine tuning llm models for enterprises and providing them inferencing trainmyllm.ai, would love to do a POC with you guys, would solve all my billing related problems with this.
Cheers
Would love to set that up. We’ve seen LLM infra teams hit billing issues fast because of usage spikes, token metering, credit handling. It gets messy to build it in-house.
I’ll DM you, we can run a quick POC and see if it fits your setup. Appreciate the interest.
Impressive, Koshima! Billing at scale ,especially with high-frequency event ingestion and credits—is notoriously tricky, so it's great to see an open-source approach to solving these pain points. The real-time debugger and usage analytics API sound particularly useful. Curious how you handle deduplication and idempotency under heavy concurrency. Looking forward to trying this out!
Appreciate it. We enforce idempotency at the event level using client-provided deduplication keys, so even with high concurrency or retries, the billing pipeline stays consistent.
For internal retries, we batch in-memory and attach unique IDs before dispatch to avoid double-counting.
Great work! The time-bucketed usage summaries and support for hybrid pricing models are super compelling. How are you handling billing-grade precision for things like token-based metering—especially when upstream services (like OpenAI) may have delays or partial responses? Also curious how you model retroactive adjustments without compromising invoice integrity.
Appreciate it. For token-based metering, we lean on event-level tracking with strict timestamps and unique identifiers to maintain billing precision, even when upstream responses are delayed or partial. If OpenAI or similar services provide incomplete data, we flag those events for retry or exclusion to avoid corrupting aggregates.
Retroactive adjustments are handled by versioning usage records, so instead of overwriting historical events, we apply corrections as delta events. Invoices pull from the latest state, but we preserve full audit logs underneath to avoid integrity gaps.
One thing we’re handling differently is entitlements. Most billing tools stop at metering and invoicing, but they don’t track what features or limits a customer can actually access based on their plan, usage, or credits. We’re building that into the system so your app doesn’t need to maintain extra state for feature flags or usage limits.
Hi HN, Nikhil here, CTO at Flexprice.
To add some context to our journey: before coming to HN, we launched on Product Hunt and spoke with over 100 companies to understand how they approach billing for AI and API-driven products. We’ve seen two main approaches:
1. *Homegrown billing stacks:* Many teams build their own systems in-house, but these quickly become a headache—especially as more products move to pure usage-based pricing or hybrid models with fixed fees. These systems are complex, require significant engineering resources to maintain, and ensuring accuracy is a constant challenge.
2. *Building on top of Stripe/Chargebee:* While these platforms are evolving to support usage-based billing alongside subscriptions, they still lack robust workflows for credits—which are becoming the new currency in usage-based models.
With Flexprice, our goal is to make it easy for developers to get started, integrate quickly, and address the less obvious pain points of usage metering and credits management. We’re focused on supporting all the pricing combinations we see emerging among AI-first companies, and on building an eventually consistent engine that can handle the realities of real-world usage data.
AI pricing models are evolving fast, and we’re eager to hear feedback from the HN community on where the pain points really are, so we can target them directly.
We've been building a GenAI application with fairly simple credit-based workflows to begin with. But just a month into launch, the workflows have started getting more complex, and we’re now rolling out 4 different pricing plans with volume-based tiers.
I was exploring how best to build a flexible credit billing system when I came across Flexprice — and I have to say, it's a great product.
Good to see you here on HN. Wishing you all the success!
Cheers.
Appreciate that, and glad Flexprice showed up at the right time. We’ve seen the same pattern: credit systems always look simple with one plan, but as soon as you introduce volume tiers, expirations, or promo logic, it turns into fragile code fast.
We’re building Flexprice specifically to avoid that constant rewrite cycle, good to know it resonates. Happy to chat if you run into edge cases while scaling those new plans.
Good luck with your GenAI launch!
Curious to hear how others have handled scale challenges in billing infrastructure:
If you're running usage-based billing for AI, infra, or API-heavy platforms— How do you deal with high-throughput event ingestion (say, 10k+ events/sec) without dropping events or messing up customer metering?
We’ve seen setups struggle hard with:
Event ordering guarantees
Idempotency at scale
Handling retries without double-counting
Would love to hear what infra patterns, queues, or storage choices worked (or failed) for you—especially?
Great question!
Our approach focuses on: - Fire-and-forget ingestion with in-memory queues so events don’t block product requests - Strict idempotency tokens tied to every event, enforced at the API layer - Lightweight retry logic that prevents double-counting but guarantees delivery under transient failures
Storage-wise, we’ve leaned on a mix of time-series DBs for raw events and pre-aggregated summaries for billing views.
Would love to swap notes on failure patterns or queue setups if you’ve dealt with similar scale.
I run my startup for fine tuning llm models for enterprises and providing them inferencing trainmyllm.ai, would love to do a POC with you guys, would solve all my billing related problems with this. Cheers
Would love to set that up. We’ve seen LLM infra teams hit billing issues fast because of usage spikes, token metering, credit handling. It gets messy to build it in-house.
I’ll DM you, we can run a quick POC and see if it fits your setup. Appreciate the interest.
Impressive, Koshima! Billing at scale ,especially with high-frequency event ingestion and credits—is notoriously tricky, so it's great to see an open-source approach to solving these pain points. The real-time debugger and usage analytics API sound particularly useful. Curious how you handle deduplication and idempotency under heavy concurrency. Looking forward to trying this out!
Appreciate it. We enforce idempotency at the event level using client-provided deduplication keys, so even with high concurrency or retries, the billing pipeline stays consistent.
For internal retries, we batch in-memory and attach unique IDs before dispatch to avoid double-counting.
Great work! The time-bucketed usage summaries and support for hybrid pricing models are super compelling. How are you handling billing-grade precision for things like token-based metering—especially when upstream services (like OpenAI) may have delays or partial responses? Also curious how you model retroactive adjustments without compromising invoice integrity.
Appreciate it. For token-based metering, we lean on event-level tracking with strict timestamps and unique identifiers to maintain billing precision, even when upstream responses are delayed or partial. If OpenAI or similar services provide incomplete data, we flag those events for retry or exclusion to avoid corrupting aggregates.
Retroactive adjustments are handled by versioning usage records, so instead of overwriting historical events, we apply corrections as delta events. Invoices pull from the latest state, but we preserve full audit logs underneath to avoid integrity gaps.
being on the developer side, the realtime dashboard for admin ui is really useful for end to end testing
so many of the eventually consistent stores reflect so late that its a pain to work with their SDKs
out of curiosity, what kind of metrics or observability do we get for events that got dropped due to some issue
This looks similar to some of the billing tools from YC, are you guys doing anything extra?
One thing we’re handling differently is entitlements. Most billing tools stop at metering and invoicing, but they don’t track what features or limits a customer can actually access based on their plan, usage, or credits. We’re building that into the system so your app doesn’t need to maintain extra state for feature flags or usage limits.