NVIDIA's Rubin Is Late. Here's Who Wins and Who Gets Squeezed.
The world’s most anticipated chip just hit a wall. Not a compute wall. Not a power wall.
A memory wall.
NVIDIA’s Vera Rubin GPU — the chip that was supposed to deliver 5x inference performance over Blackwell and 10x lower cost per token — is reportedly delayed by one quarter. The reason isn’t engineering. The reason isn’t demand. The reason is that the world simply cannot manufacture enough HBM4 memory to feed it.
This is the moment the bottleneck rotation reaches NVIDIA’s own roadmap.
What Happened
Supply chain sources report that NVIDIA has reduced initial Vera Rubin wafer starts at TSMC. The cause: HBM4 supply is lower than expected. With fewer GPUs being packaged, NVIDIA’s CoWoS advanced packaging needs have also dropped.
The whiplash is striking. In January, Jensen Huang stood on stage at CES and called Vera Rubin “in full production.” In early March, reports indicated NVIDIA was actually accelerating Rubin by shifting H200 TSMC capacity over to Rubin production.
Now, weeks later, the opposite: wafer starts cut, shipments delayed one quarter.
What happened between “full production” and “delayed”? Most likely, “full production” referred to the design being taped out and validated — the chip itself works. The delay is in volume manufacturing, because you can’t package a GPU without the memory that goes on it. And HBM4 isn’t ready at scale.
Why HBM4 Is the Chokepoint
Vera Rubin requires 288GB of HBM4 per GPU, with 20.5 TB/s memory bandwidth. That’s a generational leap over GB300’s HBM3E. But HBM4 is a new memory generation requiring new manufacturing processes, and the ramp is behind schedule.
NVIDIA has named Samsung and SK Hynix as the exclusive HBM4 suppliers for Vera Rubin — Micron is out. That means only two fabs on earth are making the memory NVIDIA needs, and both are still ramping.
SK Hynix is delivering “final samples” of HBM4 to NVIDIA “imminently.” That language tells you they’re still in qualification, not volume production. Samsung has raised its HBM4 yield target to 85% — by year-end. That means yields are significantly below 85% today.
The memory supply chain is running at maximum capacity and it’s still not enough:
- SK Hynix is paying ASML 15-20% premiums on EUV lithography tools to accelerate delivery
- SK Hynix plans to spend $13.3 billion on equipment this year alone
- Samsung and SK Hynix shares swung $300 billion in 48 hours on helium supply fears from the Iran crisis
- NVIDIA has locked up $95.2 billion in supply commitments — but HBM4 is a new generation that those commitments can’t conjure into existence faster
This is exactly what SemiAnalysis CEO Dylan Patel predicted on the Dwarkesh podcast: the ultimate bottleneck for AI compute isn’t power or data centers — it’s semiconductor manufacturing. Specifically, it’s memory. And now it’s directly impacting NVIDIA’s product timeline.
Who Wins, Who Loses
Google: The Clear Winner
Google’s TPUs don’t use HBM4. They don’t use HBM at all in the traditional sense — Google designs its own memory hierarchy integrated with the TPU architecture. While every HBM4-dependent chip faces supply constraints, Google TPUs are immune. And as we’ll see below, the constraints don’t hit everyone equally.
Google has already sold approximately 1 million TPU v7 (Ironwood) chips to Anthropic. Meta struck a multi-billion dollar deal to rent and eventually buy TPU hardware for its own data centers. With Rubin delayed, the pitch to AI labs becomes simple: why wait for a chip that doesn’t have memory when we have one that’s shipping today?
AMD: The Binning Advantage Nobody Is Talking About
The Rubin delay gives AMD a window to close the gap. AMD’s MI455X targets H2 2026 for initial shipments — and AMD VP Anush Elangovan has been emphatic: MI455X is “right on target for shipments in 2H 2026,” calling delay reports “BS.” If Rubin slips to Q4 2026 or Q1 2027 while MI455X ships on schedule, the two chips could overlap in volume for the first time — AMD has always been a full generation behind NVIDIA on datacenter GPUs.
The obvious assumption: MI455X also requires 432GB of HBM4 per chip, so AMD faces the same shortage. But look closer at the specs.
AMD MI455X needs 19.6 TB/s bandwidth. On HBM4’s 2048-bit interface, that maps to roughly 9.8 Gbps per pin — within or barely above the JEDEC standard range of 6.4-9.6 Gbps. This is achievable with standard process yields.
NVIDIA Rubin needs 22 TB/s. That requires ~11 Gbps per pin — well above JEDEC spec. Only the top speed bins qualify.
This means the same fab line produces HBM4 that sorts into two buckets:
- Top bin (11+ Gbps) → qualifies for NVIDIA Rubin
- Lower bin (9-10 Gbps) → rejects for NVIDIA, but perfectly good parts for AMD MI455X
If top-bin yields are poor (Samsung’s 85% yield target by year-end suggests they are), NVIDIA gets starved while AMD gets adequate supply from the parts NVIDIA can’t use. Every HBM4 stack that fails NVIDIA’s qualification is a stack that passes AMD’s.
The irony is profound. NVIDIA revised its bandwidth spec from 13 to 22 TB/s specifically to beat AMD’s 19.6 TB/s. That revision forced all three memory vendors to redesign, delayed mass production, and may have created a binning dynamic where AMD — the company NVIDIA was trying to beat — gets its memory on time while NVIDIA waits for top-bin parts that don’t exist in volume yet.
NVIDIA chose to win on specs. AMD might ship first by asking for less.
On top of the potential hardware timing advantage, AMD gets software time. Every quarter Rubin is late is a quarter where AMD’s inference stack improves. The MoRI library delivered 1.5x inference improvement in just 30 days. SGLang v0.5.6 shipped with AMD optimizations built in. AMD MI355 already matches NVIDIA B200 at FP8 disaggregated serving.
NVIDIA’s software moat — CUDA, TRT-LLM, Dynamo — is real. They achieved 2x inference improvement in 60 days through software alone. But that advantage depreciates when your next-gen hardware is late and your competitor’s software is catching up.
AI Labs: Squeezed
OpenAI committed to 5 gigawatts on Vera Rubin. Anthropic needs 5+ GW by year-end and is already paying up to $2.40/hour for H100 spot contracts — versus $1.40/hour build cost. Every quarter Rubin is delayed is a quarter where these labs either:
- Overpay for current-gen Blackwell/Hopper compute
- Shift to Google TPUs or AWS Trainium
- Wait — and fall behind competitors who secured compute earlier
Dylan Patel’s analysis suggests Anthropic was already “conservative on compute” compared to OpenAI, which signed aggressive long-term deals. The Rubin delay compounds Anthropic’s disadvantage: they need more compute than ever (revenue adding $4-6B per month), but the next-gen hardware they’re counting on isn’t arriving on time.
The Burry Connection
This is what Michael Burry has been warning about from the 10-K.
NVIDIA’s $95.2 billion in supply commitments assumed a smooth HBM4 ramp. The cash conversion cycle is extending because NVIDIA committed capital to a supply chain that can’t deliver on the original timeline.
But here’s the nuance Burry’s thesis misses: the demand isn’t going away. Every customer waiting for Rubin will still want Rubin when it ships. This is deferred revenue, not lost revenue. NVIDIA’s $68.1B quarterly revenue and $78B guidance remain intact for the current Blackwell generation.
The question is whether the delay creates enough of a window for alternatives — Google TPUs, AMD MI455X, AWS Trainium — to establish themselves. A one-quarter delay is an inconvenience. If it stretches to two quarters, it’s a structural opening.
The most important question nobody is asking: will NVIDIA extend GB300 Blackwell Ultra production to fill the gap? GB300 uses HBM3E, which is in much better supply. If NVIDIA keeps Blackwell running hot while Rubin ramps, the revenue impact is minimized — customers get compute, just not the 5x upgrade they were expecting.
The Bigger Picture
Zoom out and the Rubin delay tells a single story: the AI industry is scaling faster than the physical world can build the chips to power it.
ASML can make about 70 EUV tools per year, growing to maybe 100 by end of decade. Each tool costs $300-400 million. SK Hynix is spending $13.3 billion on equipment in a single year. NVIDIA has $117 billion in total supply commitments. The hyperscalers are deploying $600 billion in capex.
And it’s still not enough. The memory isn’t ready. The chip that depends on it is delayed. The AI labs that depend on the chip are scrambling.
Welcome to the era where the bottleneck isn’t software, algorithms, or even money. It’s atoms. Specifically, it’s the atoms arranged into HBM4 memory stacks — and the fact that arranging atoms at nanometer precision takes longer than writing code.
The next time someone tells you AI is moving too fast, remember: NVIDIA’s most anticipated chip in a decade is late because we can’t stack memory fast enough.
The world is building as fast as it physically can. It’s just not fast enough.
Claim references: [ece81337], [4b2851b6], [2125a02c], [2ad6f7c4], [10436cf6], [d189a662], [ab46b3ae], [6aee3692], [71c13b0a], [d2fce189], [c1a78ed1], [c4eb94ff], [0c2adc91], [def96f05], [0472dbc8], [711299fc], [33b87cfe], [8dab6865], [2594a15b]