Editing AI Chips and Hardware Accelerators (section)

== <span style="color: #FFFFFF;">Understanding</span> ==
Traditional CPUs are designed for sequential, low-latency computation with sophisticated branch prediction, out-of-order execution, and large caches — ideal for general-purpose code with complex control flow. Deep learning has the opposite profile: it is embarrassingly parallel (millions of independent multiply-add operations) but the same simple operation, performed billions of times per second.

'''The matrix multiply insight''': The core operation in a neural network layer is Y = XW, where X is the activation matrix and W is the weight matrix. For a layer with 4096 inputs and 4096 outputs processing a batch of 2048, this is a 2048×4096 multiplied by 4096×4096 matrix — roughly 34 billion multiply-add operations. A CPU does this slowly, sequentially. A GPU with 10,000+ CUDA cores does it in thousands of parallel streams.

'''Why GPUs became the AI chip''': In 2012, Krizhevsky, Sutskever, and Hinton trained AlexNet on NVIDIA GTX 580 GPUs, achieving a breakthrough on ImageNet. This demonstrated that GPU training was not just feasible but transformative — a trend that has only accelerated.

'''The memory wall''': Modern AI chips can compute faster than they can feed data from memory. The H100 can do 2 PFLOPS of FP16 compute but its memory bandwidth is "only" 3.35 TB/s. For large transformer inference, most time is spent waiting for weights to be streamed from memory, not computing. This is called '''memory bandwidth bottleneck''' and drives design decisions in inference chips (large on-chip SRAM, HBM stacking).

'''The interconnect problem''': A single H100 has 80GB HBM. GPT-3 (175B parameters) requires ~350GB in FP16. Training requires 8–16 GPUs minimum. Connecting them with high-bandwidth NVLink (900 GB/s) vs. standard PCIe (64 GB/s) changes training throughput dramatically. Multi-node training requires fast InfiniBand networking between GPU servers.
</div>

<div style="background-color: #8B0000; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">