Congrats to the team. 23 people building something used by 25% of Fortune 500 is remarkable execution.
I'm curious about one thing though: the blog post mentions integration at the 'model and inference layer,' not just bolting it onto Frontier as a feature. That's a deeper integration than most acquisitions.
In practice, does that mean the security testing becomes invisible for OpenAI-hosted models? Because if so, that's great for OpenAI customers but creates an interesting gap for teams building on Claude, Gemini, or local models. They lose their independent testing tool.
The other thing I keep thinking about: promptfoo solves pre-deployment testing really well. But once agents are running in production making tool calls, the attack surface changes. A prompt that passes red-teaming can still be exploited through indirect injection via tool outputs.
Has anyone been working on runtime monitoring for agents? Not just testing before you ship, but watching what agents actually do during execution?
The one I'd ask if I were reading this: what happens to Promptfoo open source? We're going to keep maintaining it. The repo will stay public under the same license, we will continue to support multiple providers, and we'll keep reviewing PRs and cutting releases.
We started Promptfoo because there was no good way to test AI systems before shipping them. That turned into evals, then red teaming, then a broader security platform. We're joining OpenAI because this work has more impact closer to the model and infrastructure layers.
Thanks for your work and I really really hope the open source project stays maintained. I’ve been using it from the very beginning and it’s been great!
Congrats to the team. 23 people building something used by 25% of Fortune 500 is remarkable execution.
I'm curious about one thing though: the blog post mentions integration at the 'model and inference layer,' not just bolting it onto Frontier as a feature. That's a deeper integration than most acquisitions.
In practice, does that mean the security testing becomes invisible for OpenAI-hosted models? Because if so, that's great for OpenAI customers but creates an interesting gap for teams building on Claude, Gemini, or local models. They lose their independent testing tool.
The other thing I keep thinking about: promptfoo solves pre-deployment testing really well. But once agents are running in production making tool calls, the attack surface changes. A prompt that passes red-teaming can still be exploited through indirect injection via tool outputs.
Has anyone been working on runtime monitoring for agents? Not just testing before you ship, but watching what agents actually do during execution?
Hey HN - Michael here, co-founder of Promptfoo.
Happy to answer questions.
The one I'd ask if I were reading this: what happens to Promptfoo open source? We're going to keep maintaining it. The repo will stay public under the same license, we will continue to support multiple providers, and we'll keep reviewing PRs and cutting releases.
We started Promptfoo because there was no good way to test AI systems before shipping them. That turned into evals, then red teaming, then a broader security platform. We're joining OpenAI because this work has more impact closer to the model and infrastructure layers.
Ask me anything.
Congrats!
What convinced you this was the right moment and the right company to join?
You went from founding to acquisition in roughly a year and change.
Did something about the AI security landscape cause this offer to make sense? Or was it the impact you could have inside OpenAI? Or something else?
You guys absolutely rock. You've built an industry standard with Promptfoo. Keeping it open source was the right choice.
Thanks for your work and I really really hope the open source project stays maintained. I’ve been using it from the very beginning and it’s been great!
congrats on the deal.
[dead]