2026-03-21 9 min read

The HBM4 Yield Game: More Memory, Less Power, Cheaper Silicon — Pick All Three

HBM4NVIDIAAMDSamsungMemorySupply ChainVera RubinMI455X

Here’s a question nobody is asking about the AI chip war: what happens to the HBM4 dies that aren’t fast enough for NVIDIA?

The answer reshapes everything we think we know about AMD’s competitive position.

NVIDIA’s Vera Rubin GPU demands HBM4 memory running at 10 Gbps per pin — roughly 56% faster than the JEDEC industry standard midpoint. Only the top 20-30% of Samsung’s HBM4 production can hit that speed. The rest? They’re perfectly good chips. They just don’t run fast enough for Jensen.

AMD’s MI455X needs just 6.5 Gbps. That’s the JEDEC floor. Every functional die qualifies.

This isn’t a technical footnote. It’s the most important supply chain asymmetry in the semiconductor industry right now.

The Speed Bin Lottery

When Samsung manufactures HBM4, every wafer produces a bell curve of dies. Some run blazing fast. Most land in the middle. Some barely pass. The speed at which a die reliably operates determines its “bin” — and its price.

HBM4 Estimated Yield by Speed Bin

6.4 Gbps

JEDEC floor

~88%

AMD MI455X zone

8.0 Gbps

JEDEC mid

~68%

No dedicated buyer

9.6 Gbps

JEDEC high

~48%

No dedicated buyer

10 Gbps

Above spec

~25%

NVIDIA Vera Rubin zone

13 Gbps

Samsung peak

~7%

Marketing only

Estimated yield at each speed bin. Based on semiconductor industry log-normal distribution norms. Samsung has not disclosed actuals.

Think of it like a coffee grading system. Samsung grows the beans (manufactures the wafers). The top 20-30% get labeled “Specialty Grade” and shipped to NVIDIA at premium prices. The next 50-60% are “Commercial Grade” — they taste great, they just weren’t the absolute best on the lot. That’s AMD’s supply.

The bottom ~12% fail to qualify at any HBM4 speed bin. Individual DRAM dies that fail pre-stacking KGD (Known Good Die) testing could theoretically be salvaged as standard DRAM, but HBM dies are physically different (TSV holes, different I/O layout), making repurposing impractical. Once stacked and bonded, a single bad die scraps the entire package — you can’t disassemble a bonded HBM stack. The top 5-7% at 13 Gbps exist mostly on Samsung’s marketing slides.

NVIDIA requires the top 20-30% of HBM4 yields. AMD uses the floor bin. Every wafer Samsung runs for NVIDIA generates 2-3x more AMD-grade dies than NVIDIA-grade dies.

Two Philosophies, One Wafer

NVIDIA and AMD are building fundamentally different memory architectures for the same generation of AI chips — and that choice has cascading consequences.

Vera Rubin vs MI455X: Two Approaches to the Same Problem

Metric	NVIDIA Vera Rubin	AMD MI455X
HBM4 Speed	10 Gbps	~6.5 Gbps
Stacks per GPU	8	12
Total Bandwidth	20.5 TB/s	19.6 TB/s
Total I/O Power	100% (baseline)	~63% of NVIDIA
Yield Requirement	Top 20-30%	~80-85% (floor bin)
HBM4 Capacity	288 GB	432 GB
JEDEC Compliance	Above spec	At JEDEC floor

Highlighted cells indicate advantage. I/O power scales with frequency squared (P=I²R).

NVIDIA chose speed: 8 stacks running at 10 Gbps to hit 20.5 TB/s bandwidth. Fewer stacks means a simpler interposer. But the speed requirement filters out 70-80% of Samsung’s production.

AMD chose width: 12 stacks running at just 6.5 Gbps to hit 19.6 TB/s. More stacks means a larger CoWoS-L interposer, more complex packaging, and additional yield risk at the system level. But every functional HBM4 die qualifies.

The bandwidth is nearly identical. The trade-offs are not.

The Physics Tax

Here’s where it gets interesting. The power consumed by a high-speed memory interface scales roughly with the square of the data rate — driven by signal integrity, equalization, and termination overhead that compounds at higher frequencies.

Running HBM4 at 10 Gbps doesn’t cost 54% more power than 6.5 Gbps. Per pin, it costs roughly (10/6.5)² = 2.37x more power. That’s physics — not a design choice.

But AMD uses 12 stacks versus NVIDIA’s 8. More stacks means more I/O pins, which partially offsets the per-pin savings. The full accounting:

NVIDIA: 8 stacks × (10 Gbps)² = 800 relative power units
AMD: 12 stacks × (6.5 Gbps)² = 507 relative power units

Net result: AMD’s total HBM4 I/O power is roughly 63% of NVIDIA’s — a 37% power saving despite running 50% more stacks. The frequency-squared advantage outweighs the stack count penalty.

That 37% power gap goes straight into thermal headroom. Heat that NVIDIA must dissipate but AMD doesn’t generate. On bandwidth-bound AI inference workloads, that margin can be the difference between being memory-limited and being compute-limited.

AMD’s engineer confirmed publicly that each Instinct generation has its memory controller specifically designed for one HBM generation. MI300 for HBM3 at 5.2 Gbps. MI350 for HBM3E at 8 Gbps. MI455X for HBM4 at whatever speed bin optimizes energy-per-bit — not Samsung’s peak 13 Gbps. This is deliberate architecture, not compromise.

The Beautiful Asymmetry

Here’s the part that makes a supply chain analyst’s head spin.

How One Samsung Wafer Serves Two Competitors

12%

58%

25%

Defective

Dies that fail basic testing

AMD / OpenAI Pool

Floor-bin: 6.4-9.6 Gbps (JEDEC range)

NVIDIA Pool

Top-bin: 10+ Gbps (above JEDEC)

Peak Bin

13 Gbps: marketing/testing only

Key insight: NVIDIA's demand for top-bin HBM4 creates AMD's supply. The more wafers Samsung runs for NVIDIA, the more floor-bin dies become available for AMD at lower cost.

NVIDIA’s demand for top-bin HBM4 improves AMD’s supply. This isn’t metaphorical — it’s arithmetic.

Every time Samsung starts a new wafer lot for NVIDIA, roughly:

25% of dies qualify for NVIDIA’s 10+ Gbps spec
58% of dies fall into the 6.4-9.6 Gbps JEDEC range — AMD’s pool
~12% fail to qualify at any speed bin (estimated — no manufacturer publishes exact figures; compound yield math shows 95% per-die yield gives 66% stack yield for 8-Hi)
5% hit 13 Gbps (Samsung’s flex, nobody’s buying these regularly)

The more aggressively NVIDIA ramps Vera Rubin, the more floor-bin HBM4 floods the market for AMD. And because floor-bin dies are abundant, they cost less per gigabyte. AMD gets a volume discount on perfectly good silicon that NVIDIA’s specification filter rejected.

Samsung wins too. Without AMD (and potentially OpenAI) buying floor-bin dies, Samsung’s effective HBM4 yield for “sellable product” would be a miserable 25-30%. With AMD taking the floor bin, Samsung’s effective yield jumps to 85-90%. Every customer in this triangle makes every other customer’s economics work better.

Samsung’s effective HBM4 yield: ~25% with only NVIDIA as a customer. ~85-90% with both NVIDIA and AMD. The three-way supply relationship isn’t charity — it’s mutually assured profitability.

What NVIDIA’s Delay Really Means

When TrendForce reported Vera Rubin was delayed approximately one quarter (to Q4 2026 or Q1 2027), they attributed it to “HBM4 supply constraints.” Now you understand what that really means.

NVIDIA demanded per-pin speeds exceeding 11 Gbps — some reports say they initially targeted 13 Gbps. Samsung, SK Hynix, and Micron all had to redesign their HBM4 products. The delay isn’t about Samsung’s ability to make HBM4. It’s about Samsung’s ability to make HBM4 fast enough for Jensen.

This is the “high-bin sourcing trap.” NVIDIA’s specs require the top 20-30% of yields. When those yields don’t materialize at production volume, NVIDIA either:

Waits — accepting the delay while Samsung improves process maturity
Relaxes the spec — NVIDIA already revised from 13 Gbps down to ~10 Gbps
Adds suppliers — hence the 48-hour pivot from 2 to 3 HBM4 vendors at GTC

Jensen chose all three. And each choice made AMD’s position stronger.

NVIDIA’s Need for Speed Is Accidentally Subsidizing AMD

Let’s make the subsidy mechanism explicit, because it’s beautifully counterintuitive.

Step 1: NVIDIA demands 10 Gbps HBM4. Samsung has to run more wafers to get enough top-bin dies. If NVIDIA needs 1 million top-bin dies, Samsung must produce roughly 4 million total dies to yield 1 million at 10+ Gbps (at ~25% yield).

Step 2: The other 3 million dies don’t disappear. They’re perfectly functional HBM4 — they just run at 6-9 Gbps instead of 10+. Before AMD existed as a buyer, these were essentially scrap. Samsung would either downgrade them to cheaper products or eat the loss.

Step 3: AMD shows up and says “we’ll take those.” AMD’s MI455X only needs 6.5 Gbps. The 3 million “rejects” are exactly what AMD wants. And because supply is abundant (3x more floor-bin dies than top-bin), AMD has pricing leverage. Samsung is happy to sell them — every die sold is pure margin recovery.

Step 4: The more NVIDIA buys, the more AMD gets. If NVIDIA doubles its Vera Rubin order, Samsung doubles wafer starts. That means double the top-bin dies for NVIDIA — but also double the floor-bin dies for AMD. NVIDIA is effectively funding Samsung’s wafer production, and AMD is skimming the overflow.

Step 5: AMD’s cost per die drops as NVIDIA scales. With NVIDIA absorbing the R&D costs and process improvement pressure (Samsung has to keep pushing yields higher for Jensen), the floor-bin becomes more reliable and cheaper over time. AMD’s HBM4 gets better because NVIDIA keeps demanding more.

This is the accidental subsidy. NVIDIA pays for the bleeding edge. AMD harvests the middle of the curve. Samsung profits from both. And the consumer gets two competitive AI chip platforms instead of one — all because of a yield curve and two very different memory architectures.

There’s a historical parallel here. In the early days of x86, Intel sold Celeron processors harvested from the same wafers as premium Pentiums — lower speed bins, lower prices, same silicon. The budget chips weren’t inferior designs. They were the natural output of a yield curve that the premium product defined. AMD is doing the same thing with HBM4 speed bins thirty years later — except this time, they designed the entire GPU architecture around the sweet spot of someone else’s yield curve.

The Unanswered Question

There’s one variable we don’t know yet: what speed bin is OpenAI using?

Samsung reportedly signed an exclusive HBM4 deal with OpenAI. If OpenAI is building custom AI chips (not buying NVIDIA GPUs), they may also target floor-bin HBM4 like AMD. That would create a three-way split:

NVIDIA: Top 20-30% (10+ Gbps)
AMD + OpenAI: Remaining 60-70% (6.4-9.6 Gbps)

If true, NVIDIA’s aggressive speed requirement inadvertently created a buyer’s market for floor-bin HBM4 — exactly the market that AMD and OpenAI need.

What to Watch

This story has three tells to monitor:

1. Samsung HBM4 yield reports. Any disclosure of speed bin distribution data changes the math. If Samsung’s process matures and 10 Gbps yields improve to 40-50%, NVIDIA’s supply constraint eases — but so does AMD’s cost advantage (fewer “reject” dies available cheaply).

2. AMD MI455X packaging yield. AMD’s 12-stack approach trades die-level yield advantage for packaging complexity. Fitting 12 HBM4 stacks on one CoWoS interposer is harder than 8. If packaging yields are low, AMD’s die-level advantage gets eaten.

3. OpenAI’s HBM4 spec disclosure. If OpenAI targets floor-bin like AMD, Samsung’s allocation math gets complicated — three customers competing for the same 60-70% of output, while only NVIDIA can use the top 25%.

The AI chip war isn’t just about who designs the best GPU. It’s about who designed their memory architecture to survive the yield curve.

The market prices AMD’s memory strategy as a weakness — fewer stacks per GPU, lower speed per pin, dependent on Samsung’s floor bin. The yield curve says it’s an edge.

AMD bet on the floor. That bet is looking increasingly smart.

Confidence:

High

Medium

Low

Samsung's HBM4 uses 6th-generation 10nm-class DRAM process (1c) with a 4nm logic base die, delivering up to 13 Gbps processing speed and maximum 3.3 TB/s bandwidth per stack — exceeding JEDEC industry standards.

Source: AMD Newsroomsurfaced Mar 2026

9360809e

Samsung signed a Memorandum of Understanding (MOU) with AMD to be the primary HBM4 supplier for the AMD Instinct MI455X GPU, formalizing Samsung's role as the lead HBM4 partner.