Understanding GPU Power States for Better Performance
Summary
This article delves into the importance of managing GPU power states to achieve consistent and high performance in applications. It explores how using the SetStablePowerState
function in DirectX 12 and the nvidia-smi
utility can help stabilize GPU clock rates, making performance measurements more reliable.
Introduction
Modern processors, including GPUs, dynamically adjust their core and memory clock rates during application execution. This variability can introduce errors in performance measurements and make comparisons between different runs challenging. To address this issue, developers can use specific tools and functions to set stable power states on GPUs.
The Role of SetStablePowerState
The SetStablePowerState
function in DirectX 12 allows developers to read the GPU’s predetermined stable power clock rate. This rate can vary by board, making it essential to understand the specific stable clock rate for each GPU model. However, relying solely on SetStablePowerState
is not recommended because it does not lock the memory clock, which can affect the comparability of results.
Using nvidia-smi
for Stable Clock Rates
The nvidia-smi
utility provides a more comprehensive approach to managing GPU clock rates. It allows developers to set both the core and memory clock rates, ensuring consistent performance measurements. Here are some key commands:
- Query Supported Clock Rates:
nvidia-smi --query-supported-clocks=timestamp,gpu_name,gpu_uuid,memory,graphics --format=csv
- Set Core and Memory Clock Rates:
nvidia-smi --lock-gpu-clocks=<core_clock_rate> nvidia-smi --lock-memory-clocks=<memory_clock_rate>
- Reset Core and Memory Clock Rates:
nvidia-smi --reset-gpu-clocks nvidia-smi --reset-memory-clocks
Best Practices
To ensure reliable performance measurements, it is crucial to follow these best practices:
-
Use
nvidia-smi
to Set Stable Clock Rates: Before attempting measurements, usenvidia-smi
to set the GPU core and memory clocks. This ensures that the clock rates remain consistent throughout the measurement process. -
Run Commands with Appropriate Permissions: On Windows, run commands in an administrator console. On Linux-like OSs, prepend
sudo
to the commands. -
Scripting for Convenience: Writing a simple script to lock the clocks, launch the application, and reset the clocks after exit can streamline the process.
-
Understand GPU Performance States: Familiarize yourself with GPU performance states (P-States), which range from P0 (highest performance/power state) to P15 (lowest performance/power state). Each P-State maps to a specific performance level.
Additional Considerations
- Dynamic Power States: Dynamic power states (DPS) can help conserve energy without compromising performance by scaling the GPU’s power state based on current demands and historical data.
- Performance Testing: Comprehensive performance testing, including load testing and soak testing, is essential to understand how APIs perform under various conditions. This includes testing for intensity and duration of load to identify potential bottlenecks and resource leaks.
Table: GPU Performance States
P-State | Description |
---|---|
P0/P1 | Maximum 3D performance |
P2/P3 | Balanced 3D performance-power |
P8 | Basic HD video playback |
P10 | DVD playback |
P12 | Minimum idle power consumption |
Table: Key Commands for nvidia-smi
Command | Description |
---|---|
nvidia-smi --query-supported-clocks |
Query supported clock rates |
nvidia-smi --lock-gpu-clocks |
Set core clock rate |
nvidia-smi --lock-memory-clocks |
Set memory clock rate |
nvidia-smi --reset-gpu-clocks |
Reset core clock rate |
nvidia-smi --reset-memory-clocks |
Reset memory clock rate |
Table: Best Practices for Performance Measurements
Best Practice | Description |
---|---|
Use nvidia-smi |
Set stable clock rates before measurements |
Run with appropriate permissions | Use administrator console or prepend sudo |
Scripting | Lock clocks, launch application, and reset clocks |
Understand P-States | Familiarize yourself with GPU performance states |
Conclusion
Managing GPU power states is critical for achieving consistent and high performance in applications. By using the SetStablePowerState
function in conjunction with the nvidia-smi
utility, developers can ensure reliable performance measurements. Following best practices such as setting stable clock rates, running commands with appropriate permissions, and understanding GPU performance states can further enhance the accuracy of performance evaluations.