Deploying NVIDIA H200 NVL at Scale with New Enterprise Reference Architecture

Scaling AI with NVIDIA H200 NVL: A New Enterprise Reference Architecture

Summary: NVIDIA’s H200 NVL is the latest addition to the Hopper platform, designed to accelerate AI and HPC workloads. This article explores how to deploy H200 NVL at scale using a new enterprise reference architecture, focusing on optimal performance, networking, and system configurations.

Understanding the H200 NVL

The NVIDIA H200 NVL is a versatile platform optimized for enterprise workloads, delivering accelerated performance for a wide range of AI and HPC applications. With its dual-slot PCIe form-factor and 600W TGP, the H200 NVL is designed to meet the growing demands of AI processing.

Enterprise Reference Architecture

The new enterprise reference architecture for H200 NVL is designed to streamline deployments and ensure optimal performance. This architecture includes detailed server configurations and networking recommendations tailored for AI clusters.

Server Configurations

For optimal performance, the H200 NVL is deployed in a PCIe Optimized 2-8-5 configuration, which includes two CPU sockets, eight GPUs, and five network adapters. This configuration ensures balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports.

Parameter	System Configuration
GPU configuration	GPUs are balanced across CPU sockets and root ports.
NVLink Interconnect	H200 NVL supports NVL4 and NVL2 bridges. Pairing of GPU cards under the same CPU socket is best; pairing of GPU cards under different CPU sockets is acceptable but not recommended.
CPU	Intel Emerald Rapids, Intel Sapphire Rapids, Intel Granite Rapids, and Intel Sierra Forest AMD Genoa and AMD Turin
CPU sockets	Two CPU sockets minimum
CPU speed	2.0 GHz minimum CPU clock
CPU cores	Minimum 7 physical CPU cores per GPU
System memory	Minimum 128 GB of system memory per GPU
DPU	One NVIDIA BlueField-3 DPU per server
PCI Express	One Gen5 x16 link per maximum two GPUs. Recommend one Gen5 x16 link per GPU
PCIe topology	Balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports. NIC and NVMe drives should be under the same PCIe switch or PCIe root complex as the GPUs.

Networking Considerations

For optimal performance, NVIDIA recommends using NVIDIA networking in conjunction with the H200 NVL platform. This includes using NVIDIA BlueField Data Processing Units (DPUs) for all North-South traffic to offload and ensure secure, efficient handling of requests from outside the network.

East-West Traffic

East-West traffic refers to traffic between H200 NVL systems within the cluster, typically for multi-node AI training, HPC collective operations, and other workloads. This traffic requires high bandwidth and low latency solutions to ensure seamless data flow within the data center.

North-South Traffic

North-South traffic involves traffic between H200 NVL systems and any external resources, including cloud management and orchestration systems, remote data storage nodes, and other parts of the data center or the Internet. This traffic is critical for storage connectivity for data ingestion and result delivery.

Switching

For all Enterprise RAs, NVIDIA provides configuration recommendations for Ethernet, which is the preferred switching for enterprise workloads. Combined with NVIDIA Spectrum-X Ethernet, the H200 NVL platform delivers the highest performance for DL training and inference, data science, scientific simulation, and other modern workloads.

Network Topology Diagram

The network topology diagram illustrates the configuration built upon building blocks of scalable units (SU), each containing 4 partner servers. This provides for rapid deployment of systems of multiple sizes and networked via the OOB and Consolidated Network.

Tables:

Table 1: H200 NVL System Configuration

Parameter	System Configuration
GPU configuration	GPUs are balanced across CPU sockets and root ports.
NVLink Interconnect	H200 NVL supports NVL4 and NVL2 bridges. Pairing of GPU cards under the same CPU socket is best; pairing of GPU cards under different CPU sockets is acceptable but not recommended.
CPU	Intel Emerald Rapids, Intel Sapphire Rapids, Intel Granite Rapids, and Intel Sierra Forest AMD Genoa and AMD Turin
CPU sockets	Two CPU sockets minimum
CPU speed	2.0 GHz minimum CPU clock
CPU cores	Minimum 7 physical CPU cores per GPU
System memory	Minimum 128 GB of system memory per GPU
DPU	One NVIDIA BlueField-3 DPU per server
PCI Express	One Gen5 x16 link per maximum two GPUs. Recommend one Gen5 x16 link per GPU
PCIe topology	Balanced PCIe topology with GPUs spread evenly across CPU sockets and PCIe root ports. NIC and NVMe drives should be under the same PCIe switch or PCIe root complex as the GPUs.

Table 2: Networking Recommendations

Traffic Type	Networking Recommendation
East-West Traffic	Use NVIDIA networking for high bandwidth and low latency solutions.
North-South Traffic	Use NVIDIA BlueField Data Processing Units (DPUs) for secure, efficient handling of requests from outside the network.
Switching	Use Ethernet for enterprise workloads, combined with NVIDIA Spectrum-X Ethernet for optimal performance.

Conclusion

Deploying NVIDIA H200 NVL at scale with the new enterprise reference architecture is crucial for achieving optimal performance in AI and HPC workloads. By following the recommended server configurations and networking guidelines, enterprises can ensure seamless data flow and efficient processing of AI models. This architecture is designed to meet the growing demands of AI processing and provide a robust foundation for enterprise AI infrastructure.

Understanding the H200 NVL#

Enterprise Reference Architecture#

Server Configurations#

Networking Considerations#

East-West Traffic#

North-South Traffic#

Switching#

Network Topology Diagram#

Tables:#

Conclusion#