technology

The Next Trade Is Not Training

AI's second economy will be measured in daily use: inference, cost per token, latency, and margin.

March 5, 2025

Training the monster impresses; feeding it every day reveals the business.

markets

The Next Trade Is Not Training

AI's second economy will be measured in daily use: inference, cost per token, latency, and margin.

The first phase of generative AI was dominated by training. Larger models, larger clusters, larger datasets, benchmarks, launches, parameters, laboratories, frontier narratives. This phase is theatrical because training allows spectacle. "We built the largest." "We surpassed the previous one." "We launched a model." The market likes visible greatness. But businesses do not live on training alone. They live on repeated use.

The next trade is inference. Not as a technical word for insiders, but as the daily economy of AI. Every question, every answer, every agent, every copilot, every search, every summary, every API call, every image, every automated decision consumes capacity. Training is building the factory. Inference is operating the factory every day. The second can be larger, more recurring, and more cruel with costs.

Nvidia, AMD, and Palantir are obvious names. Nvidia because it remains at the center of accelerated hardware. AMD because inference can open room for alternatives if cost per token, availability, and customization matter. Palantir because corporate inference needs to enter operations, not merely answer loose questions. But Broadcom, Marvell, Arista, Credo, and Coherent may explain the next bottleneck.

Broadcom and Marvell can capture custom silicon, connectivity, and infrastructure. Arista captures networking. Credo captures high-speed interconnection. Coherent captures optical and photonic components relevant to data transmission. As inference grows, moving data cheaply, quickly, and efficiently can be as important as processing it. The AI factory is not a single brain. It is a network of brains, memory, communication, and energy.

Perhaps in 2026 the market begins to change obsession. Less "how many parameters?" More "what cost per token?" Less "what benchmark?" More "what margin per call?" Less "which model is most powerful?" More "which model is cheap enough to be used millions of times?" This change separates science from business. The most impressive model may not be the most profitable model.

The way to profit is to observe the inference chain. Companies will need to reduce latency, cost, energy consumption, and supplier dependence. There will be room for GPUs, alternative accelerators, ASICs, networks, switches, optics, active cables, optimization software, caching, model routing, compression, quantization, memory, and platforms that choose the right model for each task. Inference is an economic problem before it is merely technical.

Palantir enters because companies do not merely want to infer. They want to infer over internal data, with permission, context, and action. The value of the corporate token is not in the pretty answer. It is in the decision that changes operations. If AI reduces collection cycles, avoids fraud, prioritizes maintenance, improves logistics, or accelerates critical support, the cost per token can be justified. If it only generates generic text, it will be crushed by competition.

The counter-thesis is that inference can become a commodity. Smaller models can reduce demand for expensive hardware. Efficiency can compress supplier revenue. Hyperscalers can internalize silicon. Broadcom and Marvell can win or lose depending on who captures design. Arista can face spending cycles. Credo and Coherent can be volatile. AMD can remain behind. Palantir can be expensive. Nvidia can remain dominant, but future margin can be questioned if customers seek alternatives.

But the central point is that AI stops being an event and becomes consumption. When something becomes consumption, unit economics command. Cost per token, latency, availability, energy per response, utilization rate, margin per user. The language changes. And when language changes, the market changes winners or at least widens the board.

The investor who remains trapped in training may miss the next displacement. Training creates headlines. Inference creates recurring bills.

The recurring bill is where truth lives.

Leo Bentier