Unlocking High-Performance Computing: How to Use NVIDIA Nsight Compute for Roofline Analysis

Summary High-performance computing (HPC) applications require careful optimization to maximize performance on various hardware platforms. NVIDIA Nsight Compute offers a powerful tool for analyzing and improving HPC applications using roofline analysis. This article explores how to use Nsight Compute to perform roofline analysis, understand its benefits, and apply it to real-world applications.

Understanding Roofline Analysis

Roofline analysis is a visual performance model that helps developers understand how well their application is using available hardware resources. It was invented at Lawrence Berkeley National Laboratory and has been widely adopted in the HPC community. The traditional roofline model relies on two key characteristics:

  • Arithmetic Intensity: The ratio between compute work (FLOPs) and data movement (bytes)
  • FLOP/s: Floating-point operations per second

By plotting these characteristics on a graph, developers can visualize how their application is affected by hardware limitations such as memory bandwidth and theoretical compute limits.

Using Nsight Compute for Roofline Analysis

Nsight Compute is a CUDA kernel profiler that provides detailed performance measurements and optimization recommendations. It now includes support for roofline analysis, making it easier to understand and improve application performance.

To enable roofline charts in Nsight Compute, follow these steps:

  1. Select the GPU Speed of Light Roofline Chart section when profiling from the GUI. This section is included in the detailed or full sets.
  2. Use the command-line flag --set detailed or --set full to include the roofline chart section.
  3. Manually select individual sections with the --section flag. The name of the roofline chart section is SpeedOfLight_RooflineChart.

Step-by-Step Roofline Analysis

To demonstrate how to use Nsight Compute for roofline analysis, let’s walk through a simple example:

  1. Run Nsight Compute and select the kernel you want to analyze.
  2. Enable the roofline chart section using one of the methods described above.
  3. Run the analysis and view the roofline chart.

The roofline chart will show you where your kernel is on the roofline, indicating whether it is memory-bound or compute-bound. This information is crucial for guiding optimization efforts.

Hierarchical Roofline Analysis

The traditional roofline model only considers the GPU’s DRAM memory. However, memory subsystems are more complex, and the Hierarchical Roofline model extends the traditional model to include the GPU’s L1 and L2 caches. While Nsight Compute does not currently support the Hierarchical Roofline model, it provides an extensible interface that allows you to create your own implementation.

Real-World Applications

Roofline analysis has been successfully applied to various HPC applications, including material science and deep learning. For example, the National Energy Research Scientific Computing Center (NERSC) has used roofline analysis to optimize HPC codes running on NVIDIA GPUs.

Benefits of Roofline Analysis

Roofline analysis provides several benefits for HPC application development:

  • Identifies performance bottlenecks: By visualizing where your kernel is on the roofline, you can quickly identify whether it is memory-bound or compute-bound.
  • Guides optimization efforts: Knowing where your kernel is on the roofline helps you focus on the most important optimization techniques.
  • Tracks progress: By using roofline analysis to track changes in your application’s performance, you can see the impact of your optimization efforts.

Table: Key Concepts in Roofline Analysis

Concept Description
Arithmetic Intensity Ratio between compute work (FLOPs) and data movement (bytes)
FLOP/s Floating-point operations per second
Roofline Chart Visual representation of application performance relative to hardware limitations
GPU Speed of Light Roofline Chart Section in Nsight Compute that provides roofline analysis data
Hierarchical Roofline Model Extension of the traditional roofline model that includes GPU L1 and L2 caches

Table: Benefits of Roofline Analysis

Benefit Description
Identifies Performance Bottlenecks Helps developers understand where their application is limited by hardware
Guides Optimization Efforts Provides insight into the most important optimization techniques
Tracks Progress Allows developers to see the impact of their optimization efforts

Conclusion

Nsight Compute’s roofline analysis feature is a powerful tool for understanding and improving HPC application performance. By using roofline analysis, developers can identify performance bottlenecks, guide optimization efforts, and track progress. With its easy integration with other Nsight Compute features, roofline analysis is an essential tool for any HPC developer.