Scaling AI with NVIDIA H200 NVL: A New Enterprise Reference Architecture
Summary: NVIDIA’s H200 NVL is the latest addition to the Hopper platform, designed to accelerate AI and HPC workloads. This article explores how to deploy H200 NVL at scale using a new enterprise reference architecture, focusing on optimal performance, networking, and system configurations.
Understanding the H200 NVL
The NVIDIA H200 NVL is a versatile platform optimized for enterprise workloads, delivering accelerated performance for a wide range of AI and HPC applications. With its dual-slot PCIe form-factor and 600W TGP, the H200 NVL is designed to meet the growing demands of AI processing.
Enterprise Reference Architecture
The new enterprise reference architecture for H200 NVL is designed to streamline deployments and ensure optimal performance. This architecture includes detailed server configurations and networking recommendations tailored for AI clusters.
Server Configurations
For optimal performance, the H200 NVL is deployed in a PCIe Optimized 2-8-5 configuration, which includes two CPU sockets, eight GPUs, and five network adapters. This configuration ensures balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports.
Parameter | System Configuration |
---|---|
GPU configuration | GPUs are balanced across CPU sockets and root ports. |
NVLink Interconnect | H200 NVL supports NVL4 and NVL2 bridges. Pairing of GPU cards under the same CPU socket is best; pairing of GPU cards under different CPU sockets is acceptable but not recommended. |
CPU | Intel Emerald Rapids, Intel Sapphire Rapids, Intel Granite Rapids, and Intel Sierra Forest AMD Genoa and AMD Turin |
CPU sockets | Two CPU sockets minimum |
CPU speed | 2.0 GHz minimum CPU clock |
CPU cores | Minimum 7 physical CPU cores per GPU |
System memory | Minimum 128 GB of system memory per GPU |
DPU | One NVIDIA BlueField-3 DPU per server |
PCI Express | One Gen5 x16 link per maximum two GPUs. Recommend one Gen5 x16 link per GPU |
PCIe topology | Balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports. NIC and NVMe drives should be under the same PCIe switch or PCIe root complex as the GPUs. |
Networking Considerations
For optimal performance, NVIDIA recommends using NVIDIA networking in conjunction with the H200 NVL platform. This includes using NVIDIA BlueField Data Processing Units (DPUs) for all North-South traffic to offload and ensure secure, efficient handling of requests from outside the network.
East-West Traffic
East-West traffic refers to traffic between H200 NVL systems within the cluster, typically for multi-node AI training, HPC collective operations, and other workloads. This traffic requires high bandwidth and low latency solutions to ensure seamless data flow within the data center.
North-South Traffic
North-South traffic involves traffic between H200 NVL systems and any external resources, including cloud management and orchestration systems, remote data storage nodes, and other parts of the data center or the Internet. This traffic is critical for storage connectivity for data ingestion and result delivery.
Switching
For all Enterprise RAs, NVIDIA provides configuration recommendations for Ethernet, which is the preferred switching for enterprise workloads. Combined with NVIDIA Spectrum-X Ethernet, the H200 NVL platform delivers the highest performance for DL training and inference, data science, scientific simulation, and other modern workloads.
Network Topology Diagram
The network topology diagram illustrates the configuration built upon building blocks of scalable units (SU), each containing 4 partner servers. This provides for rapid deployment of systems of multiple sizes and networked via the OOB and Consolidated Network.
Tables:
Table 1: H200 NVL System Configuration
Parameter | System Configuration |
---|---|
GPU configuration | GPUs are balanced across CPU sockets and root ports. |
NVLink Interconnect | H200 NVL supports NVL4 and NVL2 bridges. Pairing of GPU cards under the same CPU socket is best; pairing of GPU cards under different CPU sockets is acceptable but not recommended. |
CPU | Intel Emerald Rapids, Intel Sapphire Rapids, Intel Granite Rapids, and Intel Sierra Forest AMD Genoa and AMD Turin |
CPU sockets | Two CPU sockets minimum |
CPU speed | 2.0 GHz minimum CPU clock |
CPU cores | Minimum 7 physical CPU cores per GPU |
System memory | Minimum 128 GB of system memory per GPU |
DPU | One NVIDIA BlueField-3 DPU per server |
PCI Express | One Gen5 x16 link per maximum two GPUs. Recommend one Gen5 x16 link per GPU |
PCIe topology | Balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports. NIC and NVMe drives should be under the same PCIe switch or PCIe root complex as the GPUs. |
Table 2: Networking Recommendations
Traffic Type | Networking Recommendation |
---|---|
East-West Traffic | Use NVIDIA networking for high bandwidth and low latency solutions. |
North-South Traffic | Use NVIDIA BlueField Data Processing Units (DPUs) for secure, efficient handling of requests from outside the network. |
Switching | Use Ethernet for enterprise workloads, combined with NVIDIA Spectrum-X Ethernet for optimal performance. |
Conclusion
Deploying NVIDIA H200 NVL at scale with the new enterprise reference architecture is crucial for achieving optimal performance in AI and HPC workloads. By following the recommended server configurations and networking guidelines, enterprises can ensure seamless data flow and efficient processing of AI models. This architecture is designed to meet the growing demands of AI processing and provide a robust foundation for enterprise AI infrastructure.