Defensibility: Power within the AI stack

Part 6: Hardware, hosting, models, and infrastructure

Nov 26, 2024

This is the sixth in a series of posts dedicated to understanding defensibility in technology-driven markets.

If you’re new, start with the first post and subscribe to be notified as new posts are published.

Enough with principles. Let’s now examine each layer of the AI stack to ask whether customers will be able to switch vendors freely (a commodity market) or where they might prefer a premium vendor. We’ll bring in historical examples that can inform market dynamics; however, we should always be wary of overfitting.

“History doesn’t repeat itself, but it often rhymes.”
- Mark Twain (attributed)

To have a chance at predicting how things play out, we have to pay close attention to:

Technology markets—where snowflakes cluster into snowballs.
Leaky abstractions—where build vs. buy decisions require detailed technical knowledge of the component in question and the psychology around it.
Power—where a company maintains margin amidst competition because of something they control while excluding others from the same advantage.
Waves of disruptive technology—where a new stack becomes viable to deliver superior customer value, while watching out for additional waves of disruption to come.
Sufficiency in existing stacks—when advancement in a technology no longer meaningfully impacts the market.

Compute Hardware

In modern AI systems, there are two phases that each require compute: training and inference.

Training is the process of converting a model from a blank slate—a child—into something useful. Currently training is done in datacenters. Much like the education of human children, training state of the art models is really expensive1 because it requires a colossal amount of data and compute. Algorithmic enhancements will likely decrease compute requirements for a given level of model performance. however, the demand for better models will lead to a net increasing demand for training compute. We’ll dig a bit deeper into the economics behind model performance demand in the foundation model section.

Once a model is ready to do work—inference—it still requires compute. In many cases, inference requires less compute and can be done both in datacenters and the field; however, recent advancements in “chain of thought” models require much more compute at inference time leading to datacenters being a primary execution venue.

I believe datacenter vs. local compute will be the primary segments of the compute hardware market. Let’s dig into each individually:

Datacenter—Training and Inference

NVIDIA is currently king with the best hardware and a software network effect (e.g., CUDA). Compute, however, is a commodity there are many working to improve the software of other hardware vendors. Prolific engineer George Hotz, originally known for his iOS jailbreak work, is actively building software tooling to enable modern AI stacks to run on non-NVIDIA GPUs, as are many other developers. Similarly, Mark Zuckerberg recently made a not-so-subtle remark about their hybrid GPU infrastructure plans:

“By the end of this year, we’re going to have around 350,000 NVIDIA H100s. Or around 600,000 H100 equivalents of compute if you include other GPUs.”
-Mark Zuckerberg, 1/18/2024

While NVIDIA’s lead is formidable, I believe it would be incorrect to apply the analogy of Intel’s longtime desktop processor dominance to AI models. Unlike compiled software, in which executables target a specific CPU architecture (e.g. Intel’s x86), models are generally portable across GPU architectures. Additionally, Intel had a very long tail of CPU buyers, while with datacenter AI compute, there are a few very large buyers who make up the majority of the market—a much weaker network effect.

“Sales to one direct customer, Customer A, represented 13% of total revenue and sales to a second direct customer, Customer B, represented 11% of total revenue for the first quarter of fiscal year 2025, both of which were attributable to the Compute & Networking segment.”
NVIDIA 10-Q, filed 5/29/2024

While other customers are disclosed to be under 10% in the quarterly report, Bloomberg and Barclays Research are reported to have estimated that for the 2024 fiscal year, NVIDIA’s top four customers (Microsoft, Meta, Amazon, and Alphabet) represented approximately 40% of total revenue. It’s probably a reasonable bet that all of them are looking furiously for ways to reduce their datacenter GPU costs.

How do datacenter GPU users make build vs. buy decisions? In some cases, developer familiarity with a platform (network effect) may drive product selection. In other cases a team may carefully evaluate price/performance of alternatives, where price is a “total cost of ownership” (TCO) calculation including both hardware and energy. Viewed through the lens of the stack, datacenter GPU buyers are “above” compute hardware vendors and can “swap out” the compute component in their stacks by directing bucks at any GPU vendor they choose. Either way, datacenter compute hardware is likely to remain a critical technology in many stacks because model compute requirements keep going up.

In the case where GPU buyers are cost-optimizing over all else, can NVIDIA remain the only game in town? I suspect the answer will be a function of the durability of process power given NVIDIA’s deep experience in GPU architecture and execution speed in adapting to changing technical requirements driven by changes in underlying AI model architecture.

Local—Inference

Locally-hosted AI will become a significant market that plays out differently between end-user devices (e.g. PCs/laptops, tablets and mobile phones) and embedded devices (e.g. cars, household and industrial robots).

Regarding end-user devices, consider the recent Apple Intelligence announcement showcasing the power of models running on existing hardware devices. At the time of writing, I can run a 70B parameter language model on my Apple laptop’s M3 chip perfectly. While it’s possible for a new category of AI-first end-user hardware (e.g. wearables) to emerge, I suspect local AI to primarily drive existing hardware refresh cycles. There will be no distinct end-user AI hardware market unless some new device form factor takes off.

In the pre-AI world, embedded devices generally had minimal compute requirements and used low-cost commodity processors or microcontrollers. Self-driving vehicles, humanoid robots, and manufacturing automation robots that utilize modern AI models need a lot of compute. I believe most of these devices will locally host their models given requirements for low-latency inference (self-driving cars can’t wait to brake because of a slow connection) or where they cannot rely on continuous internet connectivity (factories can’t afford to shut down their production lines if the Internet goes out).

In embedded computing, I believe the market will bifurcate depending on whether the use case has a fixed requirements (e.g., manufacturing) or whether it has induced demand for additional intelligence (e.g., real world robotics).

In fixed-requirement embedded use cases, device makers are “above” compute hardware on the stack. These device makers will choose a hardware vendor that provides enough compute to deliver value to a device buyer even further up the stack. Because the device buyer in a fixed-requirement use case may only care about a narrow set of operating characteristics (e.g. cost, speed), compute hardware may end up as a complete commodity.

Consider the case of a manufacturing robot that uses AI to perform manufacturing steps previously performed by a human with minimal configuration. A factory owner is unlikely to pay extra for a robot with the ability to compose a sonnet worthy of Shakespeare vs. one that just does the job sans poetry.

If the requirements for a job-to-be-done are clear and fixed, device makers will run a simple cost calculation and the AI hardware vendor that provides sufficient compute at the lowest price is likely to get the sale. In fixed-requirement embedded stacks, compute hardware will eventually become sufficient.

In embedded use cases with induced demand for intelligence, requirements will rapidly change for the foreseeable future as models become increasingly capable. Because of this flux, embedded device makers need the ability to respond to changing customer demands.

For example, once humanoid robots can vacuum our floors, we’ll immediately want them to do far more complex physical and intellectual tasks such as re-organizing the closet, cooking dinner, or hand-washing delicate crystal wine glasses.

I believe that embedded device makers in “infinite intelligence” markets will evaluate price/performance tradeoffs between compute hardware vendors but will pay a premium for anything that helps them iterate faster. Compute hardware advancement massively impacts how they can deliver customer value and will likely remain critical for some time to come.

Hosting—Who runs the hardware?

If a company runs its own models2, they have to run them somewhere. After buying hardware, they need to keep it at a reasonable temperature and feed it a lot of electrons. This is a lot of work. Some companies will just have somebody else deal with it (e.g. paying AWS, GCP or Azure to host hardware for them) while others will choose to build out their own datacenters if they can justify it for cost or political reasons.

Given the capital required, I believe only cloud providers, sovereign nations, and other large organizations (e.g. xAI, Tesla) will be able to build datacenters while almost everyone else rents hardware from public cloud providers. Given these forces, I believe AI will serve as just another factor in choosing public cloud providers for the vast majority of companies and there will be no change in market structure.

There’s one exception: local inference. I believe two models will emerge: per-device compute where each hardware device is self contained and onsite “micro-datacenters” that aggregate compute loads from multiple devices. So long as compute hardware remains a relatively costly component of embedded systems, allocating compute from a large pool is more cost effective than having a lot of expensive, partially idle hardware sitting around.

Foundation Models—What runs on the hardware?

Foundation models—the configuration of parameters that transform input into hopefully useful output—are one of the most interesting areas of the stack. At time of writing, the most advanced foundation models contain trillions of parameters that are set during training.

OpenAI with its state of the art foundation models is generating billions in revenue. At the same time, as Gavin Baker of Atreides Management has quipped, “Foundation models… are likely the fastest depreciating assets in human history”.

When a new foundation model comes out that’s superior in terms of absolute capability or capability/cost ratio, then all existing models become less valuable. Models that can cost over a billion dollars to train can become nearly worthless in a single competitive release.

How can we figure out which foundation model vendors will win? First we must dig into what investments improve foundation models before understanding what might be defensible.

There are at least a few ways foundation models get better:

more compute—relatively straightforward: more capital begets more/better hardware.
better algorithms and infrastructure—a strong research and development program that can both perform novel research and implement external findings.
better datasets—proprietary data that can be used to train models.

Algorithms and infrastructure confer temporary advantage because, secrets don’t stay secret for long given the spirt of academic collaboration and the tight networking within the AI research community. Even the most secretive findings may not remain proprietary for long given how difficult it is to exclude competitors from using them.

Consider Eli Whitney’s cotton gin. He started manufacturing in 1793 and was granted a patent in 1794, but was not able to produce enough machines to meet demand (in no small part due to a fire in his workshop). After others observed the key insight—mechanical separation of cotton seeds from fibers—many produced bootleg, unlicensed cotton gins. Whitney, despite patent litigation, was ultimately unable to make much from his work that transformed the economy of the American South3. Not all valuable inventions make money for their inventors.

In order to maintain a durable algorithm and infrastructure advantage, I believe speed of execution is the most viable approach. Foundation model developers must invest large sums of money to stay ahead given 7-figure salaries for top AI researchers: not a cheap proposition.

Better datasets can make a given model perform better on real-world tasks, much like humans perform better after learning from a great teacher or reading an exceptional textbook. Some datasets are widely available (e.g. internet crawling) while others are proprietary. These proprietary datasets can come from another business the company operates (e.g. Gmail, Microsoft Outlook), data they license from copyright holders, dedicated human data generation efforts, and feedback from end users (a process known as Reinforcement Learning from Human Feedback, RLHF).

A company that maintains a data advantage, all else equal, will see their models perform better than competitors. It’s unsurprising to see both incumbents and startups claiming their proprietary datasets as key elements of their defensibility. We’ll explore the dynamics of how data impacts performance on particular use cases in the application section below.

One potential problem for companies aiming to maintain a data advantage. Researchers without access to massive datasets, being clever people, found out that they can siphon data from other models by asking them to generate data that can then be used to train other models, a process called Reinforcement Learning from AI Feedback (RLAIF). While foundation model vendors prohibit this process4, it may end up being impossible to stop because provenance of knowledge is very hard to detect once it is encoded in the weights of a neural network.

Given all of this investment in foundation models, we need to ask: who are the customers of these models and what do they want? In some cases, they need the most possible intelligence to push the boundaries of what AI can do while in other cases, they need “just enough” intelligence to solve a given problem.

As a result, the market for foundation models will bifurcate into “frontier” and “sufficiency” models.

In stacks where additional intelligence unlocks more customer value, frontier foundation models are a critical technology and will likely remain so for some time. Customers will continue paying for the best available frontier models.

For fixed-requirement use cases, sufficiency foundation models will eventually be evaluated on cost and will no longer be a critical technology.

This bifurcation is already playing out: for example, OpenAI currently sells access both to its frontier models and its “turbo” models that are less intelligent but far cheaper. For many use cases, turbo models are good enough.

Frontier models will consolidate.

With huge capital expenditures required to stay ahead in the frontier model market, there will likely be economies of scale that drive down cost of capital and amortize large research and infrastructure investments. This leads to consolidation.

There are countless examples capital intensive technology industries consolidating in the past that I believe will repeat in frontier models:

hundreds of oil refineries before Standard Oil dominated
hundreds of automakers before Ford, GM and Chrysler dominated
hundreds of PC makers before IBM, Apple, Compaq and later Dell and HP dominated

It’s very hard to be small when cost of capital is high and one has to tap equity markets for large and growing hardware and R&D expenses.

Will open or proprietary frontier models win?

If training frontier models remains extremely expensive, then the question is: "how do we pay for them?”. Independent companies (e.g. OpenAI) have, recently, kept their frontier models proprietary and charge both developers and end users to access them while Meta has released their “Llama” series of models publicly. Meta’s decision to provide state of the art frontier models is perhaps the most disruptive decision in the industry at the moment. They cannot rely on foundation models provided by others so they needed to build their own. Given they monetize on the services layer and were unlikely to build an infrastructure business, it makes a lot of sense for them to open their work and have a chance at owning the ecosystem. Meta’s open model AI strategy has a second-order effect of reducing the revenue generated by its competitors.

Looking at the operating system market may be a good analogy to predict what might happen:

Linux: open, where revenue is mostly generated by other layers of the stack, not by Linus Torvalds.
Android: open, but monetized with a services layer.
Windows: proprietary, sold directly and as a component of others’ products.
iOS/MacOS: proprietary and vertically integrated to deliver a superior user experience.

There’s a question of whether Meta’s AI strategy will result in Linux or Android. Will OpenAI stay Windows or will they eventually open their models (Linux/Android) or perhaps go the hardware route (Apple)?

Both the business model and AI safety debates around open vs. closed frontier models will, at the very least, be fascinating to watch.

Open sufficiency models will win.

Because sufficiency models eventually become “good enough”, there becomes a point by which further improvement doesn’t matter. In these circumstances, I believe the most likely outcome is that developers will choose the model with the lowest total cost (including fine tuning, license fees, and compute cost) that meets their needs.

It’s possible that highly optimized proprietary models with great developer experiences win, though I suspect that over the long term, open models will reach parity and become fully commoditized.

Should we go vertical?

In some cases, models are only applicable in a narrow set of circumstances and the best business model might be to be both a model developer and an application developer. The key question here is whether there will be a robust market for the model (it shows up in lots of stacks) or whether it’s really only one stack and the application vendor holds most of the power.

Software Infrastructure—How do we tie it all together?

AI infrastructure includes all of the supporting tools a company deploying AI needs. Examples include Data management, Training/Fine-tuning tools, Deployment platforms, Observability systems, and Security products. Given the maturity of prior software infrastructure markets, both buyers and vendors have been able to make reasonable guesses at what the product requirements are likely to be in each category. Unsurprisingly there are a litany of startups going after each.

As a note, “Infrastructure for AI” is different from “AI for Infrastructure”5. The former includes tooling for companies deploying AI and the latter is an application of AI to solve broader infrastructure problems. There are many great opportunities to use AI to improve infrastructure (e.g. cybersecurity, reliability engineering) that are applications of AI covered in the next section. This section is specifically focused on Infrastructure for AI.

Application developers are rational actors who critically evaluate AI infrastructure investments.

If we play “follow the buck” in AI, money starts at the top of the stack and flows downward. Application developers have money and they’ll make build vs. buy decisions regarding infrastructure tools. They will, of course, evaluate price vs. performance of various options.

Vector databases are a good example. When an application needs to supply contextual information to a large language model, they commonly deploy a Retrieval Augmented Generation (RAG) architecture. The “generation” part is a foundation model and the “retrieval” part is often a system called a vector database. These databases allow us to take some input data and identify conceptually-related data so that both can be provided to a model to make a decision. For example, if a financial analyst wants to know about last quarter’s IT budget breakdown by vendor, a RAG system might identify the relevant reports with raw data and supply them alongside the analyst’s question to a model to compose a coherent answer.

There are many different options for vector databases and it’s unclear how many customers will pay the premium for a proprietary vector database rather than use the free one built into the wildly popular postgres database (pgvector) that’s easy to spin up on AWS.

Infrastructure is a siren song for founders and investors.

One lesson many took away from the last market cycle was that infrastructure companies were some of the best businesses: consider the stellar performance of DataDog, CrowdStrike, Palo Alto Networks, Snowflake, ZScaler and others. The missing part of that conclusion is that the current crop of infrastructure winners stand on the shoulders of decades of investment in IT systems and cloud software. Many of these winners re-segmented existing markets (infrastructure/application monitoring, anti-virus, firewalls, databases, and security gateways, respectively) so they could tap into long standing (and long growing) IT budgets.

As an engineer, building infrastructure is also tempting because we get to sell to people who think like us and our customers mostly care about performance characteristics. Building to performance specs is much easier than convincing a whole bunch of people in the world to change their behavior (very hard!).

AI infrastructure is a new budget area, with investments being pulled forward.

Unlike the environment when the current infrastructure winners started, the AI stack is more or less getting built from the ground up from new budgets. One lesson learned from the dot com boom: innovation budgets get cut if investments come too far ahead of revenue.

Consider VA Software—a leading provider of Linux servers—before, during, and after the dot com boom. Of course customers stop buying infrastructure when they don’t have much revenue and investment capital stops flowing freely.

While AI application revenue may quickly catch up to AI investment, there’s a possibility where it doesn’t: investment capital dries up and we see an AI infrastructure winter.

There’s an oft repeated narrative that best way to make money in a gold rush is to sell picks and shovels. This might be true when there are far more miners than shovel makers; however, when every store on the block sells shovels, they’re not great businesses.

The early crypto world was a similar a picks-and-shovels rush, with many anticipating that real applications (other than speculation) would soon thrive and kill traditional “centralized” software. In a world where customer behavior could shift overnight, this isn’t a crazy belief; however, the MAYA (Most Advanced Yet Acceptable) principle tells us that there would be a high likelihood that applications on the crypto stack were just too new to gain rapid adoption. People didn’t know how to manage wallets and they weren’t ready to think about a separate (high volatility) currency for every application they wanted to use.

Today, there is a dangerous combination of a lot of AI software infrastructure vendors and not that much money being generated at the top of the stack by AI applications. There are perhaps more shovel markers than there are miners. Miners aren’t dumb and they won’t pay more than they need for shovels over time.

For large companies, over-investing in AI infrastructure ahead of revenue is rational. Being late to the AI party is an existential risk while being early is a few extra billion in innovation spend they have to write down. With that in mind, infrastructure providers who are banking on this spend would do right to carefully manage expenses and cash if the music suddenly stops.

Timing the AI infrastructure market: Being early is indistinguishable from being wrong.

Over the long run, there will be great businesses built. The question is: Who will win? Today’s AI infrastructure companies or the successors who sprout from their graves?

Founders and venture investors (myself included), when confronted with a secular trend are commonly correct on direction but wrong on duration. Many things just take a long time to play out. The key figure to watch is whether there is enough revenue being generated by application developers and enterprises deploying AI to support a robust infrastructure sector.

Today, AI applications are being deployed at mach 10, which we’ll cover below. Perhaps this time really is different from the dot com and crypto booms. In the infrastructure bull case, AI application revenue will grow extremely quickly and substantiate infrastructure spending before innovation budgets and patience run low.

As with any decision regarding founding or investing in a company with multiple future scenarios, we must underwrite both the bull and bear case futures and adjust expectations accordingly.

To predict the fate of infrastructure and lower layers of the stack, we must the key question of: “When and from where will application revenue come?”.

Defensibility
A series of posts dedicated to answer the question: Where will value accrue in AI?

Introduction: Why this time is different.
Technology markets are organized around “stacks”.
"Execution power” is becoming more important than classical “structural power”.
Market revolutions occur when “critical" technology makes a new stack “viable”.
When multiple stacks become viable in rapid succession, companies must “AND” or “OR”.
Power within the AI stack—hardware, hosting, models, and infrastructure.
Power in AI applications—big tech, switching costs, network effects, and the $100 trillion of global GDP up for grabs.

though models require far less patience.

We’ll look at the case where someone else manages that model and we access it via API in the foundation model section below.

at a terrible human cost, given its impact boosting economic incentives toward slave labor.

OpenAI flags attempts at convincing their latest o1 model to share its “chain of thought” as violations of its terms of service.

Jack Label, an early stage investor, framed this difference succinctly in a recent letter.