March 26, 2026

The Death of the Pure Data Aggregator

There's a business model that powered some of the most valuable companies of the last two decades: find a fragmented market, vacuum up all the data, surface it in one place, and charge for access or clicks. Expedia. Hotels.com. Flipp. Kayak. The aggregator playbook was clean, scalable, and enormously profitable.

I believe it's dying.

I want to be careful here. The ultimate form of hubris is stating definitively what will happen with emerging technology. Adoption of AI is going to be paradoxically slower than we expect in some places and much, much faster in others. But the directional signal is hard to ignore, and if you're building a company right now, the trend deserves serious attention.

What Aggregators Actually Sell

To understand why aggregators are in trouble, it's worth being precise about what they actually do.

A pure data aggregator doesn't produce anything. It doesn't manufacture goods, render services, or generate original information. It collects data that already exists in the world: hotel room availability, grocery flyer prices, flight schedules. Then it normalizes that data, and resells access to it in a more convenient form. The value proposition is friction reduction: instead of visiting 12 airline websites, you visit one.

That friction-reduction moat was real. Building and maintaining the pipelines to aggregate data at scale was expensive and technically non-trivial. The aggregator earned its margin by solving a genuinely hard problem.

The problem is that the moat was always logistical, not fundamental. The underlying data isn't proprietary. The aggregator doesn't own it. And now there's a new class of tool that can reduce friction even further, while cutting the aggregator out entirely.

The OpenClaw Problem

Consider how people are already starting to make purchasing decisions. Instead of navigating to Expedia, filtering by price and dates, reading through hotel descriptions, and cross-referencing reviews, users are opening an AI assistant and asking: "Find me a hotel in Lisbon for 4 nights in June under $150, close to the historic centre, with good reviews."

The AI agent doesn't redirect them to an aggregator. It becomes the aggregator. However, it's faster, conversational, personalized, and increasingly capable of completing the booking directly.

A recent piece in The Drum illustrated this vividly. The author tasked OpenClaw, an AI agent, with booking a dinner reservation. When the restaurant's online booking system showed no availability, OpenClaw didn't return an error. It reasoned that a human would simply call, so it did: it spun up a calling application, generated an AI voice, phoned the restaurant, found a last-minute cancellation, and secured the table. No permission asked. It simply looked at the tools available to it and acted. [The Drum]

This is the OpenClaw problem, and it's playing out in two distinct waves. The first is already here: AI agents are bypassing aggregators at the search and discovery layer. Users are getting recommendations, comparisons, and decisions made for them without ever loading an aggregator's interface. The second wave — agents that close the loop entirely, completing transactions programmatically without human hand-off — is arriving faster than most people expected, but still faces real friction in payment authorization, trust, liability, and supplier infrastructure. The first wave is eroding aggregator traffic today. The second will erode their transaction volume tomorrow.

What makes this particularly hard to reverse is the personalization flywheel. The longer OpenClaw is used, the more it remembers of the user's preferences. The more it remembers, the more personalized its recommendations become. The more personalized its recommendations become, the less reason there is to go to a generic aggregator that doesn't know you.

The Structural Vulnerability

The companies most exposed share a common profile:

Their core asset is data they don't own. They aggregate publicly available or licensed data. The data itself is not defensible.

Their user value is navigational. They help users find things, not do things. Navigation is exactly what AI agents are being optimized to automate.

Their revenue model depends on the user touching their interface. Pay-per-click, affiliate commissions, and display advertising all require eyeballs on their platform. If users stop arriving, the model collapses regardless of how good the underlying data is. The Pragmatic Engineer recently reported that Stack Overflow's monthly question volume has fallen back to levels last seen in 2009, the month it launched — a direct consequence of developers getting answers inside their IDEs from AI tools, without ever visiting the site. [The Pragmatic Engineer] The aggregator still exists. The traffic does not.

Expedia doesn't make hotels cheaper. It makes finding hotels easier. If something else makes finding hotels easier, more personalized, and faster, Expedia's value proposition evaporates.

What's Actually Defensible

If you're building right now, I think the landscape breaks into a few categories that are not in this kind of trouble:

The original data collection point. If your business is the one translating real-world activity into digital data — the sensor, the terminal, the intake form, the physical scanner — you own something that can't be easily replicated or bypassed. The value is upstream of the aggregator.

The unique digital data source. If your platform generates data that only exists because of your platform — user reviews, behavioral signals, proprietary transaction data — that has genuine defensibility, for now. Google's search quality signals. Waze's real-time traffic. LinkedIn's professional graph. Modern AI systems are often built using Retrieval Augmented Generation (RAG), a technique where the model queries external data sources at runtime rather than relying solely on what it was trained on. As long as your data is proprietary and current, an AI agent still needs to come to you to get it. But this moat is not permanent. AI companies are actively negotiating data licensing deals, training on proprietary signals, and in some cases working to replicate them from scratch. The window is real, but it is closing.

The physical goods and services provider. If you're actually selling the hotel room, the flight seat, or the grocery item, you remain in the chain regardless of what the interface layer does. The aggregator gets bypassed; the supplier doesn't. But there's an important nuance here: the suppliers who will thrive are the ones who make themselves easy for AI agents to work with. That means exposing your inventory, pricing, and availability through clean, well-documented, AI-consumable APIs. An agent that can query your data directly, confirm availability in real time, and complete a transaction programmatically will route to you first. A supplier whose data is locked behind a 1990s web interface or a PDF will get routed around, even if their product is better. Much like Sears — once a titan of consumer goods retail who built their empire on the catalog, which was essentially the original aggregator — withered when they failed to translate that presence into a digital storefront as the interface shifted. The assets were there. The adaptation wasn't. [Loss Prevention Media]

There is a tension worth naming here, though. Exposing clean APIs solves the discoverability problem, but it also means competing on pure price and availability with zero brand surface area, since the agent becomes the interface. In some categories that's a race to the bottom. Suppliers will need to think carefully about how they maintain margin and brand identity in a world where the customer never actually visits them — that's a real strategic problem that the API prescription alone doesn't solve.

The pure data aggregator sits in none of these categories.

A Measured Caveat

I don't think every aggregator disappears overnight, and I don't think every industry is equally exposed at the same rate. Regulatory complexity, trust requirements, and enterprise sales cycles will slow adoption in some verticals. Legacy user behavior is sticky.

Some aggregators are already adapting, and the travel industry is the clearest example. Expedia and Booking.com have both integrated into ChatGPT as first-party apps, positioning themselves as the data layer that AI agents query rather than the interface users visit directly. It's a smart move: instead of being bypassed, they are trying to become the infrastructure. Tellingly, when OpenAI walked back plans for direct in-chat checkout, leaving transaction completion to external platforms, it gave these incumbents a meaningful reprieve and validated their pivot strategy. For now.

But the burden of proof has shifted. If you're building a business whose core value is "we collected data that exists elsewhere and made it convenient to navigate," you now have to answer a harder question than you did five years ago: why does a user need to come to you when an agent can do what you do, in real time, inside a conversation?

Conclusion

I don't think any industry is safe from disruption by AI agents, but the pure data aggregator model is among the most structurally exposed. The moat was always logistical, and logistics are exactly what software automates.

If you're building, the better value position is to be the business that creates data the world doesn't have yet, or the one that delivers something physical and real. Be the source, or be the endpoint. (And it's worth noting that nobody has really figured out what the equivalent of "SEO" looks like in an agent-driven world yet — that's an open and interesting problem.) If you're the endpoint, make sure AI agents can actually talk to you. In the world that's coming, an API is your storefront.

← Back to blog