Summary

Microsoft Bing Visual Search, a powerful tool that allows users to find content using photographs as queries, has been significantly optimized with the help of NVIDIA accelerated libraries. This collaboration resulted in a 5.13x speedup and substantial cost savings, enhancing the system’s throughput and reducing energy usage. This article delves into the details of this optimization process and its benefits.

Microsoft Bing Visual Search operates on billions of images across the web, making performance critical. The core of this capability is Microsoft’s TuringMM visual embedding model, which maps images and text into a shared high-dimensional space. The original implementation used ONNXRuntime with CUDA Execution Provider for GPU acceleration but faced bottlenecks in image decoding and preprocessing.

The Role of NVIDIA Accelerated Libraries

NVIDIA TensorRT, CV-CUDA, and nvImageCodec were introduced to improve both image decoding and model inference speeds. TensorRT, with its advanced optimizations like reduced precision and layer fusion, significantly enhanced deep learning inference on NVIDIA GPUs. CV-CUDA accelerated image preprocessing operations, while nvImageCodec handled image decoding, supporting batch decoding and maintaining compatibility across various formats.

Optimizing the Visual Embeddings Model Pipeline

The collaboration between NVIDIA and Microsoft Bing Visual Search teams identified several optimization opportunities. The use of TensorRT was a key factor, leveraging its support for fused attention layers in transformer architectures. This improved the execution of computationally expensive layers, impacting end-to-end inference performance.

The Impact of NVIDIA Libraries

  • TensorRT: Provided maximum performance and efficiency for deep learning inference, leveraging advanced optimizations.
  • nvImageCodec: Enabled GPU-accelerated image decoding, supporting batch decoding and maintaining compatibility across formats.
  • CV-CUDA: Accelerated image preprocessing operations, optimized for image batches and variable shape image batches.

Results

The introduction of NVIDIA accelerated libraries resulted in a 5.13x speedup and significant TCO reduction. The optimized pipeline fully offloaded the majority of the inference task to the GPU device, allowing for more optimal and faster processing, as well as greater power efficiency.

Technical Details

TensorRT Execution Provider

  • Key Features:
    • Maximum performance and efficiency for deep learning inference.
    • Advanced optimizations like reduced precision and layer fusion.
  • Benefits:
    • Ideal for deployment scenarios where performance, throughput, and low latency are critical.

nvImageCodec

  • Key Features:
    • GPU-accelerated image decoding.
    • Supports batch decoding and maintains compatibility across formats.
  • Benefits:
    • Enables decoding multiple images simultaneously, maximizing GPU efficiency.

CV-CUDA

  • Key Features:
    • GPU-accelerated implementation of common image processing operations.
    • Optimized for image batches and variable shape image batches.
  • Benefits:
    • Accelerates preprocessing of images, exploiting batch parallelization.

The collaboration between NVIDIA and Microsoft Bing Visual Search teams highlights the potential of GPU acceleration in enhancing visual search capabilities. As visual search continues to evolve, the importance of efficient and powerful processing solutions will only grow, making NVIDIA accelerated libraries a crucial component in this field.

Table: Comparison of Original and Optimized Pipelines

Feature Original Pipeline Optimized Pipeline
GPU Acceleration ONNXRuntime with CUDA Execution Provider NVIDIA TensorRT, CV-CUDA, and nvImageCodec
Image Decoding Limited by software decoding Accelerated with nvImageCodec
Image Preprocessing Limited by OpenCV Accelerated with CV-CUDA
Performance Improvement - 5.13x speedup
TCO Reduction - Significant reduction

The Power of Collaboration

The partnership between NVIDIA and Microsoft Bing Visual Search teams demonstrates the power of collaboration in achieving significant performance improvements. By leveraging NVIDIA accelerated libraries, Microsoft Bing Visual Search was able to enhance its capabilities, providing faster and more efficient visual search experiences for users.

Conclusion

NVIDIA accelerated libraries, including TensorRT, CV-CUDA, and nvImageCodec, significantly optimized Microsoft Bing Visual Search. This optimization not only enhanced the system’s throughput but also reduced energy usage and processing times, demonstrating the critical role of GPU acceleration in large-scale visual search applications.