ListenHub

8-5

Mia: We keep hearing this narrative that AI is getting cheaper at this incredible rate. I’ve seen charts showing LLM costs dropping by a factor of ten every year. You'd think AI companies would be swimming in profits by now.

Mars: Right, that’s the playbook VCs were sold on. Charge a flat $20 a month, break even in year one, and then when the models get 10x cheaper, boom, 90% margins in year two. Yacht shopping in year three.

Mia: But that's not what's happening. We're seeing companies that were pioneers in this space, like Windsurf, getting sold for parts. Even Anthropic had to roll back its unlimited tier for Claude. The margins are getting worse, not better. Something doesn't add up.

Mars: Exactly. The core of the problem is a fundamental misunderstanding of user demand. Yes, GPT-3.5 is ten times cheaper than it used to be. But you know what? It's also about as desirable as a flip phone at an iPhone launch.

Mia: Ha, that's a great way to put it. So, you're saying the cost reduction is real, but it's for a product nobody wants anymore?

Mars: Precisely. The moment a new, state-of-the-art model is released, 99% of the demand immediately shifts to it. When GPT-4 launched, everyone jumped on it, even though the older GPT-3.5 was 26 times cheaper. Same thing happened when Claude 3 Opus came out. Demand isn't for a language model, it's for the *best* language model. And the best one always costs about the same, because that's the bleeding edge of what's possible today.

Mia: So this is what the article calls cognitive greed. When we're using these tools for serious work—coding, writing, thinking—we're not going to settle. We want the best brain we can get our hands on.

Mars: Of course. Nobody opens their AI assistant and thinks, You know what, let me use the crappy version to save my boss a few cents. Your time is valuable. You're trying to accomplish something. You're always going to max out on quality. It's like being offered a rusty wrench or a brand new one for the same job. The choice is obvious.

Mia: I see. So the dream of profiting from last year's cheaper models is just that—a dream. Companies are stuck on a treadmill, forced to offer the latest, most expensive model just to stay relevant. But that can't be the whole story, because that would just mean they break even forever, right? It gets worse.

Mars: Oh, it gets so much worse. This is where the real economic breakdown happens. Even if the price *per token* for these new frontier models stays relatively stable, something else has gone completely nuclear: the number of tokens we use per task.

Mia: You mean the answers are just getting longer?

Mars: It's more than that. The entire capability has changed. A few years ago, you'd ask ChatGPT a one-sentence question, and it would give you a one-sentence reply. A few thousand tokens, maybe. Now, you can ask it to do deep research, and it will spend three minutes planning, twenty minutes reading sources, and another five minutes writing a report for you. What used to be a 1,000-token task is now a 100,000-token task.

Mia: And that brings us to the monster truck analogy. We've built a more fuel-efficient engine, but we've used it to power a giant truck that guzzles way more gas overall.

Mars: That's the perfect summary. And the Claude Code experiment is the ultimate case study. They were incredibly smart about it. They charged $200 a month, ten times the typical price. They built a system to automatically switch to cheaper models for less intensive tasks. They even tried to offload some processing to the user's own computer.

Mia: So they tried every trick in the book to manage costs. And it still didn't work?

Mars: It got obliterated. Token consumption went supernova. One user reportedly burned through ten billion tokens in a month. That's the equivalent of 12,500 copies of War and Peace.

Mia: Ten billion? How is that even physically possible for one person? You can't type that fast.

Mars: That’s the key insight. The user wasn't just chatting with it. Once the AI is capable of running continuous tasks for 10 or 20 minutes, people discover the for loop. They stop being a user and become an API orchestrator. They set the AI on a task, tell it to check its own work, refactor the code, optimize it, and repeat the cycle... all on Anthropic's dime. The AI becomes an agent working 24/7. Consumption decouples from human time, and the math just breaks.

Mia: So the very improvement in AI capability is what makes the flat-rate subscription model completely unsustainable. It's a self-destructing business model.

Mars: It is. This is the token short squeeze that is forcing these companies to face reality. There is no flat subscription price that works in this new world.

Mia: Okay, so if every company knows this, why do they keep doing it? Why does everyone offer a $20-a-month unlimited plan when they know it's a ticking time bomb? It feels like a classic prisoner's dilemma.

Mars: It is the textbook definition of it. Everyone knows that usage-based pricing would create a sustainable industry. But if you're the one company that does it, your competitor, funded by a mountain of VC cash, will offer a flat rate and steal all your customers.

Mia: So you have a choice: charge for usage and die alone, or charge a flat rate and win for now... then die later with everyone else.

Mars: And in a land grab for market share, everyone chooses to defect. They all offer the flat rate, subsidize the power users, post those beautiful hockey-stick growth charts for their investors, and push the problem down the road. It's growth today, profits tomorrow, bankruptcy eventually, but that's the next CEO's problem.

Mia: So they're all just hoping the music doesn't stop, that the VC checks keep coming to paper over the terrible unit economics.

Mars: Pretty much. But the music always stops eventually. Just ask Jasper. So the real question becomes, is there any way to actually escape this death spiral? Or is every AI company just a dead company walking?

Mia: Well, the article does propose three potential pathways out. The first is the most obvious: just do usage-based pricing from day one.

Mars: It sounds great in theory. Honest economics. No subsidies. But show me a single consumer AI company that's exploding with growth using that model. Consumers hate metered billing. They'd rather overpay for a predictable flat rate than get a surprise bill. Every successful consumer subscription—Netflix, Spotify, even ChatGPT's own basic plan—is flat rate. The moment you add a meter, consumer growth just dies.

Mia: Okay, so that path is tough for consumer-facing companies. What's the second option? It sounds like it's all about creating insane switching costs.

Mars: Right. This is the enterprise strategy. Forget the individual consumer; go after the big fish. The article mentions Devin's partnerships with Goldman Sachs and Citi. Getting those contracts is hell. It takes months of sales cycles, compliance reviews, security audits... but once you're in, you are *impossible* to churn.

Mia: Because the bureaucracy of switching to a new vendor is just too painful?

Mars: Exactly. The CFO would rather die than go through another six-month vendor evaluation. The company is so deeply integrated into their workflow that the cost of leaving is astronomical. This is how companies like Salesforce and Oracle make 80-90% margins. When your customers can't easily leave, they're not very sensitive to price.

Mia: That makes sense. It’s a moat built of red tape. And the third strategy? Vertical integration.

Mars: This is Replit's game. It’s a really clever strategy. You basically use the AI as a loss leader. You might lose money on every token the AI code generator uses, but that code has to run somewhere. It needs hosting, a database, deployment monitoring, logging...

Mia: Ah, so it's like the classic give away the razor to sell the blades model. You give away the AI for cheap to drive consumption of all your other, profitable infrastructure services.

Mars: You got it. You're not really in the business of selling AI inference. You're selling everything else in the developer stack. Let OpenAI and Anthropic have their race to the bottom on inference costs. You'll own the entire ecosystem around it. It's a genius move because code generation naturally creates demand for hosting.

Mia: So we're seeing a few potential futures here. It's not just about one model or one price. It's about fundamentally different business strategies.

Mars: Correct. We've discussed the core problem: the death spiral caused by users demanding the most expensive frontier models, while the capabilities of those models lead to an explosion in token use. This combination makes flat-rate subscriptions fundamentally broken. The prisoner's dilemma then forces everyone into this unsustainable model to grab market share, funded by VCs. The only real ways out seem to be these more sophisticated strategies: brave the consumer backlash with usage-based pricing, lock in enterprise customers with insane switching costs, or vertically integrate and use AI as a loss leader. This is the token short squeeze that's crushing so many of these companies.

Mia: It seems the future of AI's economic viability isn't about the models getting cheaper after all. It's about fundamentally reimagining how value is captured in a world where digital consumption can accelerate so relentlessly. This challenge is forcing innovators to find entirely new economic paradigms that can align AI's incredible, ever-increasing power with something that's actually sustainable. It makes you question the very idea of unlimited in a world of finite resources. Are we just witnessing the painful growing pains of a technology whose true cost we're only now beginning to understand? It feels like a necessary, if brutal, shift from naive optimism to strategic pragmatism.

Outline

AI companies built on flat-rate subscriptions face a critical challenge: while the cost of older AI models drops, user demand overwhelmingly shifts to the latest, more expensive "frontier" models. This, combined with an exponential increase in the number of tokens consumed per AI task, creates an unsustainable "token short squeeze" that leads to negative margins for companies offering unlimited usage. The article explores why this model is broken and proposes three potential strategies for AI businesses to achieve long-term viability.

The Flawed "Costs Will Drop" Premise

Initial Expectation: Many AI companies and VCs assumed that significant drops in LLM costs (e.g., 10x annually) would eventually lead to high profit margins for flat-rate services.
Reality of Demand: User demand consistently gravitates towards the "best language model" (SOTA), which always costs approximately the same due to being at the edge of inference capabilities, making older, cheaper models undesirable.
Cognitive Greed: Users are "cognitively greedy," always seeking the highest quality AI for tasks like coding or writing, regardless of the cost savings from using inferior, cheaper models.

The "Token Short Squeeze" Phenomenon

Exploding Token Consumption: While per-token costs for frontier models haven't dramatically increased, the volume of tokens consumed per task has skyrocketed (e.g., from 1,000 to 100,000 tokens for a single task).
Agentic Behavior: AI agents now engage in extended, multi-step processes (e.g., 20-minute "deep research" runs) that consume massive amounts of compute, far exceeding what a typical $20/month subscription can cover.
Unsustainable Unit Economics: This increased consumption means that every improvement in model capability simultaneously increases the compute needed, leading to negative unit economics for flat-rate subscriptions (e.g., a single daily $1 research run exceeds a $20/month cap).

The Crisis of Flat-Rate Subscriptions

Forced Rollbacks: Companies like Windsurf and Claude Code (Anthropic) have been forced to abandon "unlimited" flat-rate tiers despite sophisticated cost-management efforts (e.g., auto-scaling models, offloading processing).
Inherent Impossibility: The core conclusion is that "there is no flat subscription price that works in this new world" for token-intensive AI services, as the underlying math is fundamentally broken.
Prisoner's Dilemma: AI companies face a dilemma: adopt usage-based pricing (which consumers dislike and deters growth) or stick to flat-rate (leading to a "race to the bottom" and eventual bankruptcy by subsidizing power users).

Three Paths to Sustainable AI Business Models

Usage-Based Pricing from Day One: Implement metered billing from the start, ensuring honest economics, but this approach struggles with consumer adoption as users prefer predictable flat rates (e.g., Netflix).
Insane Switching Costs (Enterprise Focus): Target large enterprise clients (e.g., Goldman Sachs) where long implementation cycles and deep integration create extremely high switching costs, leading to "impossible to churn" revenue and high margins (80-90%).
Vertical Integration (Infrastructure Play): Use AI as a "loss leader" to drive demand for bundled, profit-generating infrastructure services (e.g., hosting, database management, monitoring), capturing value across the entire developer stack (e.g., Replit's strategy).

Script