SGNL Intelligence.
EN 中文
8 min read

Four Power Plays Reshaping AI Hardware Right Now

CerebrasAWSOptical InterconnectOCI MSAUALinkChip DesignOracleAgentic AINVIDIA

What if the company building the world’s largest chip… isn’t actually selling the chip?

This week brought four separate announcements that, taken together, tell a single story: the AI hardware stack is fracturing into specialized layers, and the companies that win won’t be the ones with the biggest chip — they’ll be the ones who control the right layer.

Let’s break it down.


1. AWS + Cerebras: Buying Speed, Not a Moat

Cerebras just landed its first cloud deal. AWS will offer Cerebras inference on Amazon Bedrock. Sounds huge. But look at the architecture:

How AWS Splits the Inference Pipeline
PrefillAWS Trainium3
DecodeCerebras CS-3
Prefill: Process the entire prompt in parallel
Compute-bound (matrix math)AWS Trainium3
Decode: Generate tokens one by one
Memory-bandwidth-boundCerebras CS-3
Disaggregated inference: different silicon for each phase

AWS doesn’t hand the entire inference job to Cerebras. Instead, it splits the pipeline in two:

  • Prefill — processing your entire prompt in one parallel burst — runs on AWS’s own Trainium3 chips. This is the compute-heavy phase. AWS keeps it in-house.
  • Decode — generating tokens one by one — runs on Cerebras CS-3. This is the memory-bandwidth-bound phase where Cerebras’s massive on-chip SRAM shines.

This is called disaggregated inference — using different hardware for different phases. It’s clever engineering. But it reveals something important about Cerebras’s position.

AWS is buying Cerebras for its speed. Not its wafer-scale chip.

Here’s the uncomfortable truth. Cerebras has built the world’s largest semiconductor — an entire wafer turned into a single processor. It’s an engineering marvel. But its headline feature, unstructured sparsity (the ability to skip zero-valued computations in neural networks), appears to be unused by any major AI workload today. The models that matter — Llama, GPT, Gemini — don’t exploit sparsity in a way that maps to Cerebras’s hardware advantage.

Morgan Stanley put it bluntly: Cerebras doesn’t have a 20–25% performance advantage once rackscale NVIDIA GB300 and Vera Rubin arrive. And the inference margins that Cerebras enjoys today? They’ll be high industry-wide. Not unique to Cerebras.

So why did AWS do the deal? Time-to-market. Right now, today, Cerebras CS-3 delivers the fastest token generation available. AWS is renting that speed gap while it lasts. When Vera Rubin ships with 5x the inference throughput of Blackwell — the gap may close.

The bull case for Cerebras isn’t hardware exclusivity — it’s that disaggregated inference becomes the standard architecture, and Cerebras keeps iterating on its decode-phase advantage faster than NVIDIA can catch up. The bear case is that NVIDIA’s software ecosystem (TRT-LLM, Dynamo) collapses the gap through optimization alone, like they already did with a 2x improvement in 60 days.


2. The Optical Consortium: Who’s In, Who’s Out, and Why It Matters

Six companies — Broadcom, AMD, Meta, Microsoft, NVIDIA, and OpenAI — just launched the Optical Compute Interconnect Multi-Source Agreement (OCI MSA). It’s an open specification for how light should carry data between chips and racks in AI clusters.

The immediate reaction: isn’t this what UALink does?

No. They’re completely different layers.

Three Standards, Three Layers
OCI MSAPhysical / Optical
How light carries data between chips and racks
In: NVIDIA, Broadcom, AMD, Meta, Microsoft, OpenAI
Out: Google, AWS, Intel, Marvell, Ayar Labs
UALinkProtocol / Interconnect
GPU-to-GPU communication within a node (NVLink competitor)
In: AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, Microsoft
Out: NVIDIA
UECNetwork / Ethernet Stack
AI-optimized Ethernet with RDMA for scale-out
In: AMD, Arista, Broadcom, Cisco, Intel, Meta, Microsoft
Three standards, three layers, three sets of power dynamics

Think of it like the internet:

  • OCI MSA is the fiber optic cable — the physical medium. Copper vs. fiber vs. co-packaged optics. How photons move.
  • UALink is the protocol — like TCP/IP for GPU-to-GPU links. How chips talk to each other within a node. It directly competes with NVIDIA’s proprietary NVLink.
  • UEC (Ultra Ethernet Consortium) is the network stack — AI-optimized Ethernet for connecting racks together at scale.

OCI MSA lives below UALink. You could run UALink over OCI MSA optics. They’re complementary, not competing.

But the absences tell a story:

Google is missing from OCI MSA. Why? Google designs its own TPU interconnect fabric end-to-end. They don’t need an industry consortium to standardize optical links — they control the entire stack from chip to rack. Their internal optical interconnect is already deployed at scale in TPU v5e and v6 pods.

AWS is missing too. Same logic. Trainium uses a custom interconnect (NeuronLink). AWS is building vertically, not horizontally.

Ayar Labs — the leading co-packaged optics startup — is excluded despite being a supplier to multiple consortium members. This is the most surprising absence. It may signal that the consortium prefers pluggable optics (traditional transceivers you can swap out) over CPO (optics embedded in the silicon). Pluggable is more flexible and field-serviceable. CPO is more efficient but harder to maintain. The consortium may be placing a bet that pluggable wins for scale-up, at least for this generation.

The companies in the consortium are the ones who buy chips from multiple vendors and need optical interoperability. The companies out are the ones who build everything themselves (Google, AWS) or who are proposing a different approach (Ayar Labs with CPO). Standard-setting is always political. The spec defines who’s interoperable — and who isn’t.


3. When AI Designs Its Own Chips

Here’s the most quietly consequential announcement of the week.

At Synopsys Converge — the semiconductor design industry’s major conference — OpenAI’s Richard Ho confirmed that AI agents are performing chip design tasks autonomously while the engineering team sleeps.

Read that again. The engineers go home. The agents keep working. In the morning, the design has progressed.

The AI Chip Design Acceleration
2024
AI assists with RTL code generation
Engineers use Copilot-style tools for Verilog boilerplate
Early 2025
AI writes competitive GPU kernels
6+ months of AI-assisted development, quality goes from 'helpful' to 'useful'
Late 2025
One-shot kernel optimization
GPT-5.3-Codex outperforms human-written kernels at scale (DoubleAI WarpSpeed)
Mar 2026
Overnight autonomous chip design
OpenAI agents doing chip design while engineering team sleeps (Richard Ho, Synopsys Converge)
Mar 2026
Cross-vendor kernel porting
Claude and Codex write GPU kernels across multi-vendor hardware, reducing porting friction
From autocomplete to autonomous design in 18 months

This isn’t hypothetical future tech. It connects to a broader pattern that’s been building for 18 months:

  • DoubleAI WarpSpeed research showed that AI-written GPU kernels now outperform human-written kernels at scale. Practitioners report the process went from “helpful” to “one-shot” — you describe what the kernel should do, and GPT-5.3-Codex writes production-quality code on the first try.
  • Claude and Codex are being used to write GPU kernels across multi-vendor hardware setups — NVIDIA, AMD, Intel — reducing the friction of porting code between architectures. What used to take weeks of specialized engineering now takes minutes.
From autocomplete to autonomous design in 18 months.

The implications are staggering. Semiconductor design cycles have historically been measured in years. A new GPU architecture takes 3–4 years from concept to silicon. If AI agents can handle the repetitive parts — RTL coding, verification, layout optimization, timing closure — those cycles compress dramatically.

This doesn’t mean AI replaces chip designers. It means one team can do the work of ten. The constraint shifts from engineering talent to design ambition. You can explore more architectural variants, run more simulations, and iterate faster than any human team could alone.

The traditional EDA moat — the $15B+ industry dominated by Synopsys and Cadence — faces an interesting paradox. Their tools are the platform AI agents run on. But if the agents get good enough, they may commoditize the tool itself. The value migrates from the software to the intelligence driving it.


4. Oracle’s $553 Billion Question

Let’s talk about the most extreme disconnect in tech right now.

Oracle: Backlog Up, Stock Down
RPO ($B)Stock ($)
Q1 2025
$130B
$168
Q2 2025
$138B
$175
Q3 2025
$310B
$190
Sep 2025
$380B
$178
Q4 2025
$455B
$132
Mar 2026
$553B
$89
RPO 4x up, stock 50% down. The market doesn't believe the backlog converts.

RPO — Remaining Performance Obligations — is an accounting term. It means: signed contracts for services Oracle hasn’t yet delivered. Think of it like a restaurant’s reservation book. RPO says how many tables are booked for the next year. Revenue says how many meals you’ve actually served.

Oracle’s RPO went from $138 billion to $553 billion in roughly a year. That’s a 4x explosion. Their reservation book is absolutely overflowing.

Meanwhile, the stock is down 50% since September 2025. Shares are back near early-2025 levels, when the backlog was only $130B.

Why the disconnect? Three reasons:

  1. Conversion skepticism. RPO is a promise, not cash. The market wants to see actual revenue growth. Building the data centers to fulfill this backlog requires massive capital expenditure — and execution risk is high.

  2. Customer concentration. OpenAI is scaling up Vera Rubin GPU clusters on Oracle’s infrastructure. That’s great until you realize a significant chunk of that $553B backlog may be one customer. If OpenAI’s compute needs shift (say, to more AWS Trainium), Oracle’s pipeline gets thinner.

  3. Contract structure. Not all RPO is created equal. Multi-year cloud deals often have usage commitments that can be renegotiated. The market is pricing in the possibility that some of these contracts have outs — or that ramps will be slower than the headline number implies.

The bull case: if Oracle converts even 30% of this backlog at cloud-level margins, the stock is absurdly cheap at current levels. Larry Ellison has been building data centers at breakneck speed — and Oracle’s database moat gives it enterprise relationships no other cloud provider can match.

The bear case: RPO is a mirage. The contracts ramp slowly, margins are thin on infrastructure deals, and customer concentration (OpenAI) creates fragility. The market has seen “record backlogs” before — it wants revenue, not reservations.


The Thread That Connects All Four

These stories seem different on the surface. A chip partnership. An optical standard. AI designing hardware. A database company’s backlog.

But they share a single thesis: the AI stack is fragmenting into specialized layers, and the winners control specific chokepoints, not the entire stack.

  • Cerebras controls token generation speed — for now. But the chokepoint shifts when Vera Rubin ships.
  • OCI MSA is a battle over who defines the optical layer. The companies that set the standard control interoperability — and lock out competitors.
  • AI chip design agents compress the timeline from years to months, potentially breaking the EDA duopoly and democratizing chip innovation.
  • Oracle’s backlog is a bet that controlling data center capacity — not just cloud software — is the real chokepoint in an AI-constrained world.

The AI stack used to be simple: buy GPUs, run models. Now it’s a layer cake of specialized silicon, optical standards, design automation, and infrastructure contracts. Understanding which layer matters most is the new alpha.


Claim references: [1fbf1fa0], [fce437d2], [738231cd], [3014cf7b], [7a35d4d9], [d858c507], [9e12bc13], [c2b29aac], [9950ef3b], [b5c817dd], [ee660b93]

Get the signal, not the noise

New analysis delivered to your inbox. No spam, unsubscribe anytime.