Summary

Video quality assessment is crucial for ensuring that video content meets the required standards. Traditional metrics like PSNR and SSIM have limitations, leading to the development of more comprehensive metrics like VMAF. This article explores how NVIDIA GPUs can accelerate VMAF calculations using CUDA, significantly improving performance and efficiency.

Understanding Video Quality Metrics

Video quality metrics are essential tools for evaluating the fidelity of video content. They provide a quantitative measurement to assess the performance of encoders and ensure that video content meets the required standards. Traditional metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) have been widely used but have limitations.

  • PSNR: Compares the pixel values of the reference image to a degraded one, focusing on signal-to-noise ratio.
  • SSIM: Compares luminance, contrast, and structure of the degraded image to the original one, providing a more nuanced assessment.
  • VMAF: Introduced by Netflix, combines human vision modeling with machine learning techniques to accurately capture human visual perception. It excels in aligning with human visual perception by combining detailed analysis of video quality factors with human vision modeling and advanced machine learning.

VMAF: A Comprehensive Metric

VMAF evaluates video quality using key elementary metrics from a reference and a distorted image:

  • Visual Information Fidelity (VIF): Quantifies the preservation of original content, reflecting perceived information loss.
  • Additive Distortion Measurement (ADM): Assesses structural changes and texture degradation, sensitive to additive distortions like noise.
  • Motion Features: Crucial for appraising motion-rendering quality in dynamic scenes.

These metrics are used as input features for a support vector machine (SVM) regressor, which integrates them to calculate the final VMAF score. This approach ensures a comprehensive and precise representation of video quality, as perceived by viewers.

Accelerating VMAF with CUDA

The CPU implementation of VMAF can distribute the computation of features over multiple threads for each image, benefiting from a higher number of CPU cores. However, the computation of the VMAF score on the CPU implementation is dependent on the slowest feature that must be extracted.

In contrast, VMAF-CUDA uses a different approach. It allocates the entire GPU compute resources for each feature and calculates them sequentially. This results in the faster computation of each feature and makes the VMAF score latency now dependent on the sum of all feature extractors.

Advantages of VMAF-CUDA

VMAF-CUDA can be used during encoding and transcoding for quality monitoring. NVIDIA GPUs can run compute workloads on GPU cores independent of NVENC and NVDEC. This implies that both the reference frame and the distorted frame stay in video memory and can be input into VMAF-CUDA.

  • Encoding: VMAF can be computed during encoding because NVENC does not require the GPU compute resources.
  • Transcoding: VMAF-CUDA can use idle resources and calculate a score without interrupting transcoding and no additional memory transfers.

Evaluation Results

The evaluation of VMAF-CUDA showed significant performance improvements:

  • VMAF per frame latency: Up to 37x lower latency at 4K.
  • Total throughput: Up to 4.4x speedup in throughput in the open-source tool FFmpeg.

The measurements were conducted using a 56C/112T Dual Intel Xeon 8480 compute node and a single NVIDIA L4 GPU. The results demonstrated a 30–90x speedup in feature extractor latencies for one image on the NVIDIA L4 over the Intel Xeon 8480 CPUs.

FFmpeg Performance Improvement

FFmpeg enables the encoded video to be read into GPU or CPU RAM directly instead of reading the raw byte streams from disk. The evaluation used the libvmaf_cuda video filter for the GPU implementation and the libvmaf video filter for the CPU implementation.

  • NVIDIA L4: Achieved 178 FPS at 4K and 775 FPS in 1080p.
  • Dual Intel Xeon 8480: Reached 64 FPS for 4K and 176 FPS for 1080p.

This represents a speedup of 2.8x for the 4K sequence and 2.5x for 1080p when processing a single video stream.

Cost Efficiency

The cost per VMAF frame for a 2U server was evaluated:

Server Type Max Total Throughput Time to Calculate VMAF Scores 3-Year TCO Cost to Calculate VMAF
NVIDIA L4 1424 FPS 21.0 hrs $30K $24
Dual Intel Platinum 8480 235 FPS 127.6 hrs $20K $97

Partner Success Stories

V-Nova explored the benefits of CUDA-accelerated VMAF calculation for several use cases, including offline metric calculation and real-time decision-making within the LCEVC (MPEG-5 Part 2) encoding process. The integration of VMAF-CUDA significantly accelerated offline metric calculations and demonstrated the potential for additional benefits in algorithmic enhancements.

Conclusion

VMAF-CUDA offers a significant performance boost in calculating video quality scores, making it a valuable tool for video content creators and distributors. By leveraging NVIDIA GPUs, VMAF-CUDA can accelerate the computation of VMAF scores, reducing latency and increasing throughput. This not only improves efficiency but also enables real-time quality monitoring during encoding and transcoding processes. As video quality continues to be a critical aspect of content delivery, VMAF-CUDA stands out as a powerful solution for ensuring high-quality video content.