Summary

NVIDIA has made a significant contribution to the Open Compute Project (OCP) by sharing the design of its GB200 NVL72 system, a high-performance server platform designed for AI and high-performance computing (HPC) applications. This contribution includes detailed electro-mechanical designs, such as rack architecture, compute and switch tray mechanicals, and liquid-cooling and thermal environment specifications. The goal is to accelerate the development of open data center platforms that can support NVIDIA’s next-generation GPUs and networking solutions.

NVIDIA’s Commitment to Open Source

NVIDIA has a long history of open-source initiatives, having released over 900 software projects on GitHub and actively participating in various open-source foundations and standards bodies. This commitment extends to the Open Compute Project, where NVIDIA has consistently contributed design specifications across multiple generations of hardware products.

Key Contributions

  • GB200 NVL72 Design: The GB200 NVL72 system features a modular design based on NVIDIA’s MGX architecture, connecting 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale configuration. This setup provides a 72-GPU NVLink domain, allowing the system to act as a massive single GPU.
  • Joint Reference Architecture: NVIDIA and Vertiv have developed a new joint reference architecture for the GB200 NVL72 system. This collaboration aims to reduce deployment time for cloud service providers (CSPs) and data centers adopting the NVIDIA Blackwell platform by up to 50%.

The Impact of NVIDIA’s Contribution

NVIDIA’s contribution of the GB200 NVL72 design to the Open Compute Project has several significant implications:

1. Accelerating AI and HPC Adoption

By sharing detailed design specifications, NVIDIA is enabling OCP members to build their own custom designs based on Blackwell GPUs. This move is expected to accelerate the adoption of high-performance computing platforms in AI and HPC applications.

2. Reducing Deployment Time

The joint reference architecture with Vertiv eliminates the need for data centers to create custom power, cooling, or spacing designs specific to the GB200 NVL72. Instead, they can rely on Vertiv’s advanced solutions for space-saving power management and energy-efficient cooling, reducing deployment time and increasing efficiency.

3. Enhancing Collaboration

NVIDIA’s contribution reinforces the importance of collaboration within the open ecosystem. By sharing critical design elements, NVIDIA is fostering a community-driven approach to developing high-performance computing platforms.

Technical Details

GB200 NVL72 System Overview

Component Description
Rack Architecture Modular design based on NVIDIA’s MGX architecture
Compute and Switch Tray Connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale configuration
Liquid-Cooling and Thermal Environment Detailed specifications for efficient cooling and thermal management
NVLink Networking Provides a 72-GPU NVLink domain, enabling the system to act as a massive single GPU

Joint Reference Architecture

Feature Benefit
Vertiv’s Power and Cooling Solutions Reduces deployment time by up to 50%
Space-Saving Power Management Enhances energy efficiency and reduces power footprint
Energy-Efficient Cooling Increases cooling efficiency and reduces thermal issues

Conclusion

NVIDIA’s contribution of the GB200 NVL72 design to the Open Compute Project marks a significant milestone in the evolution of high-performance computing platforms. By sharing detailed design specifications and collaborating with industry partners like Vertiv, NVIDIA is accelerating the adoption of energy-efficient high compute density platforms in AI and HPC applications. This move underscores the importance of open collaboration in driving innovation and efficiency in the data center ecosystem.