Based on: Rebuilding The Foundation: Why AI Infrastructure Needs To Change | Will Eatherton, Cisco | March 17, 2026
Every AI infrastructure conversation eventually lands on power. Megawatts per rack, cooling costs, grid capacity, carbon footprint. The power story is real, but we can’t forget about bandwidth.
I’m an old telecom guy, having spent over 30 years in telecom and networking, including running National Science Foundation Centers of Excellence focused on communications infrastructure. Bandwidth constraints are not new. What's new is the scale and the speed at which AI workloads are exposing them.
A 200,000-GPU (Graphics Processing Unit) cluster can consume 435 MW (Megawatts) of critical IT power. Of that, 17 MW goes to optical transceivers alone, just to move data between chips. When you scale to a million GPUs, the transceivers by themselves consume roughly 180 MW. That's not a power problem. That's a data movement problem that shows up on the power bill.
Cisco's Will Eatherton made the case this week that the real bottleneck in AI infrastructure has shifted from compute to data movement. GPU procurement still dominates the conversation. Networking, storage, and security are where the constraints are actually forming.
Bandwidth
Training large models requires clusters of tens of thousands of GPUs exchanging data continuously. The industry has settled on 102.4 Tbps (Terabits per second) switching silicon as the baseline for serious deployments. Traditional pluggable transceivers hit a wall at 800G and 1.6T speeds. The DSP (Digital Signal Processor) in each transceiver consumes up to 30W per port; at 200G channels, electrical loss reaches roughly 22 dB (decibels) before the signal reaches fiber. Two approaches address this.
Linear-drive Pluggable Optics (LPO) removes the DSP and lets the host ASIC (Application-Specific Integrated Circuit) drive the optical module directly, cutting per-link power by up to 50%.
Co-Packaged Optics (CPO) goes further by integrating optical engines onto the switch package itself, dropping electrical loss to 4 dB and per-port power to 9W. CPO eliminates the transceiver and DSP entirely, embedding electronic-to-optical conversion onto the switch ASIC.
Nvidia's Quantum-X InfiniBand CPO switches, entering production in 2026, deliver 115 Tbps across 144 ports at 800G. Broadcom's Tomahawk 6 (TH6-Davisson) ships 102.4 Tbps with full CPO. IDTechEx projects the CPO market will grow at a 37% CAGR (Compound Annual Growth Rate), exceeding $20 billion by 2036.
Topology
Scale-up (NVLink within a rack) and scale-out (InfiniBand or Ethernet across a data center) are both approaching practical limits. The next phase, scale-across, federates compute across geographically distributed locations into a single pool. Telecom engineers solved a version of this problem decades ago with distributed switching and ATM (Asynchronous Transfer Mode) traffic engineering. AI adds a harder constraint: gradient synchronization across a WAN (Wide Area Network) requires low, symmetric latency that wide-area networks were not built to guarantee. It breaks the latency-symmetry assumption in standard collective communication libraries such as NCCL (Nvidia Collective Communications Library), requiring deep-buffer routing, topology-aware all-reduce algorithms, and control planes that make traffic decisions based on path characteristics, not just throughput.
Nvidia's Spectrum-X Ethernet CPO platform targets this scale-across problem, combining switching and routing in a single solution with deep buffer support and integrated in-network computing via SHARP (Scalable Hierarchical Aggregation and Reduction Protocol).
Figure 1. Scale-across WAN topology: two GPU clusters connected via deep-buffer routers across a WAN, with shared DPU/SmartNIC security enforcement.
Storage
AI training creates a mixed-access pattern: large sequential reads across petabytes of training data, burst checkpoint writes during fault recovery, and sustained KV-cache (Key-Value cache) write pressure as context windows grow. RDMA-based (Remote Direct Memory Access) protocols, including RoCE (RDMA over Converged Ethernet) and NVMe-oF (NVM Express over Fabrics), cut storage latency from milliseconds to microseconds. Idle GPUs cost the same as active ones. When ingestion starves GPUs of data or checkpoint bursts block training progress, accelerator cycles go idle. Storage has to be designed into the architecture from the start, not bolted on.
Security
Model weights represent hundreds of millions of dollars in training cost. Protecting them requires hardware-based trust, confidential computing, and network segmentation. SmartNICs (Smart Network Interface Cards) and DPUs (Data Processing Units) now enforce zero-trust policy at line rate, isolated from the host OS (Operating System), handling IP filtering, session tracking, and rate limiting without CPU (Central Processing Unit) involvement. Multi-tenant inference clusters must maintain customer separation while meeting latency SLAs (Service Level Agreements), adding another layer of security complexity that traditional perimeter models were not designed to handle.
Organizations that get interconnect, storage, and security right will have capacity that GPU-focused competitors cannot replicate from a single cluster. Those that don't will rent infrastructure from the ones that do.
Source: Cisco Blogs: Rebuilding The Foundation — Why AI Infrastructure Needs To Change
CPO Market Data: IDTechEx — Co-Packaged Optics (CPO) 2026-2036
Nvidia CPO Technical Detail: Nvidia Developer Blog — Scaling AI Factories with Co-Packaged Optics
CPO Technology Overview: EDN — Where Co-Packaged Optics Technology Stands in 2026




