Some of what OP is saying generalizes to the concept of being "too early" - if you are early, your engineering / innovation spend is used to discover that at-the-time reasonable ideas don't work, or don't work with the current appetite, whereas later entrants can skip this exploration and start with a simple copycat.
My (business-school) partner reminds me that first movers are seldom winners.
That perfectly surmised my experience. I've been "too early" far too frequently.
Before ElevenLabs, I built an AI TTS website that got 6.5 million monthly users at peak [1]. PewDiePie and various musicians were using it. It didn't have zero shot or fine tuning, so it got wiped out pretty easily when ElevenLabs arrived.
Before Image-to-Video models got good and popular, I built a ridiculous 3D nonlinear video editor [2] for crazy people that might want to use mocap gear and timelines to control AI animation. You couldn't control the starting frame, which sucked, but you could control the precise animation minus hallucination artifacts. Luma Labs Dream Machine came out just a few weeks after our launch and utterly wiped the floor with our entire approach.
I was late to build an aggregator, but I'm a filmmaker and I'm stubborn and passionate. I'm now trying to undercut the website aggregators with a fair source desktop "bring your own keys" system [3]. Hopefully I'm "just in time" for these systems to become desktop, with spatially controllable blocking, and with "world model" integration (nobody else has that yet). It's also Rust and when I port the UX to Bevy, it's gonna sing.
This could be stated much more succinctly using Jobs to be Done (which is referenced in the first few paragraphs):
Your customers don't want to do stuff with AI.
They want to do stuff faster, better, cheaper, and more easily. (JtbD claims you need to be at least 15% better or 15% cheaper than the competition -- so if we're talking "AI", the classical ML or manual human alternative)
If the LLM you're trying to package can't actually solve the problem, obviously no one will buy it because _using AI_ OBVIOUSLY isn't anyone's _job-to-be-done_
The flip side of this is that if model capabilities are extremely strong such that they are able to saturate the benchmarks, the differentiation and defensibility of a wrapper solution built on top are significantly reduced.
IANAL but e.g. Claude Cowork is already good enough that it's hard to see how the legal tech startups are going to differentiate except around access controls, visual presentation of workflows, etc. And that's in a heavily enterprise/compliance-aware/security-focused context.
Don't get me wrong, that's still a big "except" - big enough for massive companies to be built. Personally the anxiety of being so close to being squashed by the foundation models would make me unhappy as an entrepreneur but looking at the market it seems like many people have a higher risk tolerance.
I keep saying (need to coin a name for this at some point): LLMs, by their general-purpose nature, subsume software products.
Whatever domain-specific capability some software product[0] has, if it's useful to users now, it's more useful if turned into a tool an outside LLM can wield[1]. Users don't care about software products - on the contrary, the product is what stands between the user and what they actually want. If they can afford to delegate using the product to someone else, they do - whether it's to a friend, an external contractor, or an employee hired for that purpose.
This is the value offering LMMs provide to the user: general delegation. If an LLM can operate some software for you, it frees you to focus on problems you need solved. If it can operate multiple software tools, the benefit to you grows superlinearly, as the LLM can use multiple tools to solve problems not addressed individually by any of them. Problems there are no dedicated tools for at all.
This is a big problem for the software industry as it is, because we're relying on the concept of software product as a monetizable unit - some UI layer that defines what can and cannot be done, that we can charge for, and then double-dip with upsells and dark patterns, as UIs are the perfect marketing platforms. General-purpose LLMs sitting on the outside, they break all that by erasing the "product" boundary - and what's worse (for the industry, it's great for me as the user!), as the multi-modal capabilities get better, there's nothing one can do to stop it - even if you purposefully block and obscure any (classically) machine-friendly endpoints, the LLM will just take the hard way, and operate the UI the same way human does.
There's no way I see this won't upend the entire industry in the next couple of years.
--
[0] - This includes both products you buy, and products you rent, aka. SaaS.
[1] - As opposed to "inside LLMs", AKA. AI-in-product integrations everyone's doing these days, in a desperate attempt to stay relevant. Outside vs. inside LLM is a difference between your personal assistant and the assistant at some company's reception desk.
> If MMF doesn’t exist today, building a startup around it means betting on model improvements that are on someone else’s roadmap. You don’t control when or whether the capability arrives.
I love this. I think there's a tendency to extrapolate past performance gains into the future, while the primary driver of that (scaling) has proven to be dead. Continued improvements seem to be happening through rapid tech breakthroughs in RL training methodologies and to a lesser degree, architectures.
People should see this as a significant shift. With scaling, the path forward is more certain than what we're seeing now. That means you probably shouldn't build in anticipation of future capabilities, because it's uncertain when they will arrive.
When we started building a voice agent for inbound calls, the models were close but not quite there. We spent months compensating for gaps: latency, barge-in handling, understanding messy phone audio. A lot of that was engineering around model limitations.
Then the models got better. Fast. Latency dropped. Understanding improved. Suddenly the human-in-the-loop wasn't compensating, it was enhancing.
The shift was noticeable. We went from "how do we work around this limitation" to "how do we build the best experience on top of this capability." That's MMF in practice.
The timing question is real though. We started building before MMF fully existed for our use case. Some of that early work was throwaway. Some of it became the foundation. Hard to know in advance which is which.
The danger is that we bridge that gap with backend complexity. I spent weeks over-engineering a chain of evaluators and retries to get reliable outputs from cheaper models, thinking I was optimizing margins.
Then a smarter model dropped that handled the nuance zero-shot. That sophisticated orchestration layer immediately became technical debt—slower and harder to maintain than just swapping the API endpoint.
Whatever we do now to "steer" the model to do the job, my 5 cents, it will all get sucked into the model itself; skate where the puck is going as they say, and relentlessly focus on user experience and the overall product, that's how you get something like Granola.
The thing you are referring to as "model" is also called "technology" which always came in waves throughout centuries and decades. And it opened new markets and new needs. So, in the "team, product, market" concept, the "product" included the technology stack. Model is just another piece in the stack.
Product-market fit has a prerequisite that most AI founders ignore. Before the market can pull your product, the model must be capable of doing the job. That's Model-Market Fit. When MMF Unlocks, Markets Explode (legal, coding...)
Some of what OP is saying generalizes to the concept of being "too early" - if you are early, your engineering / innovation spend is used to discover that at-the-time reasonable ideas don't work, or don't work with the current appetite, whereas later entrants can skip this exploration and start with a simple copycat.
My (business-school) partner reminds me that first movers are seldom winners.
That perfectly surmised my experience. I've been "too early" far too frequently.
Before ElevenLabs, I built an AI TTS website that got 6.5 million monthly users at peak [1]. PewDiePie and various musicians were using it. It didn't have zero shot or fine tuning, so it got wiped out pretty easily when ElevenLabs arrived.
Before Image-to-Video models got good and popular, I built a ridiculous 3D nonlinear video editor [2] for crazy people that might want to use mocap gear and timelines to control AI animation. You couldn't control the starting frame, which sucked, but you could control the precise animation minus hallucination artifacts. Luma Labs Dream Machine came out just a few weeks after our launch and utterly wiped the floor with our entire approach.
I was late to build an aggregator, but I'm a filmmaker and I'm stubborn and passionate. I'm now trying to undercut the website aggregators with a fair source desktop "bring your own keys" system [3]. Hopefully I'm "just in time" for these systems to become desktop, with spatially controllable blocking, and with "world model" integration (nobody else has that yet). It's also Rust and when I port the UX to Bevy, it's gonna sing.
[1] https://news.ycombinator.com/item?id=29688048
[2] https://vimeo.com/966897398/6dd268409c
[3] https://github.com/storytold/artcraft
Hey, if you are ever looking for a job at Krea, just let me know!
This could be stated much more succinctly using Jobs to be Done (which is referenced in the first few paragraphs):
Your customers don't want to do stuff with AI.
They want to do stuff faster, better, cheaper, and more easily. (JtbD claims you need to be at least 15% better or 15% cheaper than the competition -- so if we're talking "AI", the classical ML or manual human alternative)
If the LLM you're trying to package can't actually solve the problem, obviously no one will buy it because _using AI_ OBVIOUSLY isn't anyone's _job-to-be-done_
No AI models in 2026 even "understand the whole codebase" lol what is the author even talking about
The flip side of this is that if model capabilities are extremely strong such that they are able to saturate the benchmarks, the differentiation and defensibility of a wrapper solution built on top are significantly reduced.
IANAL but e.g. Claude Cowork is already good enough that it's hard to see how the legal tech startups are going to differentiate except around access controls, visual presentation of workflows, etc. And that's in a heavily enterprise/compliance-aware/security-focused context.
Don't get me wrong, that's still a big "except" - big enough for massive companies to be built. Personally the anxiety of being so close to being squashed by the foundation models would make me unhappy as an entrepreneur but looking at the market it seems like many people have a higher risk tolerance.
I keep saying (need to coin a name for this at some point): LLMs, by their general-purpose nature, subsume software products.
Whatever domain-specific capability some software product[0] has, if it's useful to users now, it's more useful if turned into a tool an outside LLM can wield[1]. Users don't care about software products - on the contrary, the product is what stands between the user and what they actually want. If they can afford to delegate using the product to someone else, they do - whether it's to a friend, an external contractor, or an employee hired for that purpose.
This is the value offering LMMs provide to the user: general delegation. If an LLM can operate some software for you, it frees you to focus on problems you need solved. If it can operate multiple software tools, the benefit to you grows superlinearly, as the LLM can use multiple tools to solve problems not addressed individually by any of them. Problems there are no dedicated tools for at all.
This is a big problem for the software industry as it is, because we're relying on the concept of software product as a monetizable unit - some UI layer that defines what can and cannot be done, that we can charge for, and then double-dip with upsells and dark patterns, as UIs are the perfect marketing platforms. General-purpose LLMs sitting on the outside, they break all that by erasing the "product" boundary - and what's worse (for the industry, it's great for me as the user!), as the multi-modal capabilities get better, there's nothing one can do to stop it - even if you purposefully block and obscure any (classically) machine-friendly endpoints, the LLM will just take the hard way, and operate the UI the same way human does.
There's no way I see this won't upend the entire industry in the next couple of years.
--
[0] - This includes both products you buy, and products you rent, aka. SaaS.
[1] - As opposed to "inside LLMs", AKA. AI-in-product integrations everyone's doing these days, in a desperate attempt to stay relevant. Outside vs. inside LLM is a difference between your personal assistant and the assistant at some company's reception desk.
> If MMF doesn’t exist today, building a startup around it means betting on model improvements that are on someone else’s roadmap. You don’t control when or whether the capability arrives.
I love this. I think there's a tendency to extrapolate past performance gains into the future, while the primary driver of that (scaling) has proven to be dead. Continued improvements seem to be happening through rapid tech breakthroughs in RL training methodologies and to a lesser degree, architectures.
People should see this as a significant shift. With scaling, the path forward is more certain than what we're seeing now. That means you probably shouldn't build in anticipation of future capabilities, because it's uncertain when they will arrive.
This maps to what we've seen building AI at work.
When we started building a voice agent for inbound calls, the models were close but not quite there. We spent months compensating for gaps: latency, barge-in handling, understanding messy phone audio. A lot of that was engineering around model limitations.
Then the models got better. Fast. Latency dropped. Understanding improved. Suddenly the human-in-the-loop wasn't compensating, it was enhancing.
The shift was noticeable. We went from "how do we work around this limitation" to "how do we build the best experience on top of this capability." That's MMF in practice.
The timing question is real though. We started building before MMF fully existed for our use case. Some of that early work was throwaway. Some of it became the foundation. Hard to know in advance which is which.
The danger is that we bridge that gap with backend complexity. I spent weeks over-engineering a chain of evaluators and retries to get reliable outputs from cheaper models, thinking I was optimizing margins.
Then a smarter model dropped that handled the nuance zero-shot. That sophisticated orchestration layer immediately became technical debt—slower and harder to maintain than just swapping the API endpoint.
Whatever we do now to "steer" the model to do the job, my 5 cents, it will all get sucked into the model itself; skate where the puck is going as they say, and relentlessly focus on user experience and the overall product, that's how you get something like Granola.
The thing you are referring to as "model" is also called "technology" which always came in waves throughout centuries and decades. And it opened new markets and new needs. So, in the "team, product, market" concept, the "product" included the technology stack. Model is just another piece in the stack.
Product-market fit has a prerequisite that most AI founders ignore. Before the market can pull your product, the model must be capable of doing the job. That's Model-Market Fit. When MMF Unlocks, Markets Explode (legal, coding...)