Understanding GPU Power States for Better Performance

Summary

This article delves into the importance of managing GPU power states to achieve consistent and high performance in applications. It explores how using the SetStablePowerState function in DirectX 12 and the nvidia-smi utility can help stabilize GPU clock rates, making performance measurements more reliable.

Introduction

Modern processors, including GPUs, dynamically adjust their core and memory clock rates during application execution. This variability can introduce errors in performance measurements and make comparisons between different runs challenging. To address this issue, developers can use specific tools and functions to set stable power states on GPUs.

The Role of SetStablePowerState

The SetStablePowerState function in DirectX 12 allows developers to read the GPU’s predetermined stable power clock rate. This rate can vary by board, making it essential to understand the specific stable clock rate for each GPU model. However, relying solely on SetStablePowerState is not recommended because it does not lock the memory clock, which can affect the comparability of results.

Using nvidia-smi for Stable Clock Rates

The nvidia-smi utility provides a more comprehensive approach to managing GPU clock rates. It allows developers to set both the core and memory clock rates, ensuring consistent performance measurements. Here are some key commands:

  • Query Supported Clock Rates:
    nvidia-smi --query-supported-clocks=timestamp,gpu_name,gpu_uuid,memory,graphics --format=csv
    
  • Set Core and Memory Clock Rates:
    nvidia-smi --lock-gpu-clocks=<core_clock_rate>
    nvidia-smi --lock-memory-clocks=<memory_clock_rate>
    
  • Reset Core and Memory Clock Rates:
    nvidia-smi --reset-gpu-clocks
    nvidia-smi --reset-memory-clocks
    

Best Practices

To ensure reliable performance measurements, it is crucial to follow these best practices:

  1. Use nvidia-smi to Set Stable Clock Rates: Before attempting measurements, use nvidia-smi to set the GPU core and memory clocks. This ensures that the clock rates remain consistent throughout the measurement process.

  2. Run Commands with Appropriate Permissions: On Windows, run commands in an administrator console. On Linux-like OSs, prepend sudo to the commands.

  3. Scripting for Convenience: Writing a simple script to lock the clocks, launch the application, and reset the clocks after exit can streamline the process.

  4. Understand GPU Performance States: Familiarize yourself with GPU performance states (P-States), which range from P0 (highest performance/power state) to P15 (lowest performance/power state). Each P-State maps to a specific performance level.

Additional Considerations

  • Dynamic Power States: Dynamic power states (DPS) can help conserve energy without compromising performance by scaling the GPU’s power state based on current demands and historical data.
  • Performance Testing: Comprehensive performance testing, including load testing and soak testing, is essential to understand how APIs perform under various conditions. This includes testing for intensity and duration of load to identify potential bottlenecks and resource leaks.

Table: GPU Performance States

P-State Description
P0/P1 Maximum 3D performance
P2/P3 Balanced 3D performance-power
P8 Basic HD video playback
P10 DVD playback
P12 Minimum idle power consumption

Table: Key Commands for nvidia-smi

Command Description
nvidia-smi --query-supported-clocks Query supported clock rates
nvidia-smi --lock-gpu-clocks Set core clock rate
nvidia-smi --lock-memory-clocks Set memory clock rate
nvidia-smi --reset-gpu-clocks Reset core clock rate
nvidia-smi --reset-memory-clocks Reset memory clock rate

Table: Best Practices for Performance Measurements

Best Practice Description
Use nvidia-smi Set stable clock rates before measurements
Run with appropriate permissions Use administrator console or prepend sudo
Scripting Lock clocks, launch application, and reset clocks
Understand P-States Familiarize yourself with GPU performance states

Conclusion

Managing GPU power states is critical for achieving consistent and high performance in applications. By using the SetStablePowerState function in conjunction with the nvidia-smi utility, developers can ensure reliable performance measurements. Following best practices such as setting stable clock rates, running commands with appropriate permissions, and understanding GPU performance states can further enhance the accuracy of performance evaluations.