How NVIDIA Accelerated Libraries Supercharged Microsoft Bing Visual Search
Summary
Microsoft Bing Visual Search, a powerful tool that allows users to find content using photographs as queries, has been significantly optimized with the help of NVIDIA accelerated libraries. This collaboration resulted in a 5.13x speedup and substantial cost savings, enhancing the system’s throughput and reducing energy usage. This article delves into the details of this optimization process and its benefits.
The Challenge of Large-Scale Visual Search
Microsoft Bing Visual Search operates on billions of images across the web, making performance critical. The core of this capability is Microsoft’s TuringMM visual embedding model, which maps images and text into a shared high-dimensional space. The original implementation used ONNXRuntime with CUDA Execution Provider for GPU acceleration but faced bottlenecks in image decoding and preprocessing.
The Role of NVIDIA Accelerated Libraries
NVIDIA TensorRT, CV-CUDA, and nvImageCodec were introduced to improve both image decoding and model inference speeds. TensorRT, with its advanced optimizations like reduced precision and layer fusion, significantly enhanced deep learning inference on NVIDIA GPUs. CV-CUDA accelerated image preprocessing operations, while nvImageCodec handled image decoding, supporting batch decoding and maintaining compatibility across various formats.
Optimizing the Visual Embeddings Model Pipeline
The collaboration between NVIDIA and Microsoft Bing Visual Search teams identified several optimization opportunities. The use of TensorRT was a key factor, leveraging its support for fused attention layers in transformer architectures. This improved the execution of computationally expensive layers, impacting end-to-end inference performance.
The Impact of NVIDIA Libraries
- TensorRT: Provided maximum performance and efficiency for deep learning inference, leveraging advanced optimizations.
- nvImageCodec: Enabled GPU-accelerated image decoding, supporting batch decoding and maintaining compatibility across formats.
- CV-CUDA: Accelerated image preprocessing operations, optimized for image batches and variable shape image batches.
Results
The introduction of NVIDIA accelerated libraries resulted in a 5.13x speedup and significant TCO reduction. The optimized pipeline fully offloaded the majority of the inference task to the GPU device, allowing for more optimal and faster processing, as well as greater power efficiency.
Technical Details
TensorRT Execution Provider
- Key Features:
- Maximum performance and efficiency for deep learning inference.
- Advanced optimizations like reduced precision and layer fusion.
- Benefits:
- Ideal for deployment scenarios where performance, throughput, and low latency are critical.
nvImageCodec
- Key Features:
- GPU-accelerated image decoding.
- Supports batch decoding and maintains compatibility across formats.
- Benefits:
- Enables decoding multiple images simultaneously, maximizing GPU efficiency.
CV-CUDA
- Key Features:
- GPU-accelerated implementation of common image processing operations.
- Optimized for image batches and variable shape image batches.
- Benefits:
- Accelerates preprocessing of images, exploiting batch parallelization.
The Future of Visual Search
The collaboration between NVIDIA and Microsoft Bing Visual Search teams highlights the potential of GPU acceleration in enhancing visual search capabilities. As visual search continues to evolve, the importance of efficient and powerful processing solutions will only grow, making NVIDIA accelerated libraries a crucial component in this field.
Table: Comparison of Original and Optimized Pipelines
Feature | Original Pipeline | Optimized Pipeline |
---|---|---|
GPU Acceleration | ONNXRuntime with CUDA Execution Provider | NVIDIA TensorRT, CV-CUDA, and nvImageCodec |
Image Decoding | Limited by software decoding | Accelerated with nvImageCodec |
Image Preprocessing | Limited by OpenCV | Accelerated with CV-CUDA |
Performance Improvement | - | 5.13x speedup |
TCO Reduction | - | Significant reduction |
The Power of Collaboration
The partnership between NVIDIA and Microsoft Bing Visual Search teams demonstrates the power of collaboration in achieving significant performance improvements. By leveraging NVIDIA accelerated libraries, Microsoft Bing Visual Search was able to enhance its capabilities, providing faster and more efficient visual search experiences for users.
Conclusion
NVIDIA accelerated libraries, including TensorRT, CV-CUDA, and nvImageCodec, significantly optimized Microsoft Bing Visual Search. This optimization not only enhanced the system’s throughput but also reduced energy usage and processing times, demonstrating the critical role of GPU acceleration in large-scale visual search applications.