Summary
NVIDIA has released updates to its CUDA-X AI software, a deep learning software stack designed for researchers and software developers to build high-performance GPU-accelerated applications for conversational AI, recommendation systems, and computer vision. This update includes new features and improvements across various libraries and tools, enhancing the performance and usability of AI applications.
NVIDIA CUDA-X AI: Unlocking High-Performance AI Applications
NVIDIA CUDA-X AI is a comprehensive deep learning software stack that empowers researchers and software developers to create high-performance GPU-accelerated applications. The latest updates to CUDA-X AI bring significant enhancements to various libraries and tools, making it easier to build and deploy AI applications.
cuDNN 8.1: Enhanced Deep Neural Network Performance
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. The latest version, cuDNN 8.1, includes several key improvements:
- Support for BFloat16: This version supports BFloat16 for CNNs on NVIDIA Ampere architecture GPUs, offering better performance and efficiency.
- New C++ Front-End API: An easy-to-use C++ front-end API is now available in open source, wrapping the flexible v8 backend C API.
- Operator Fusion: cuDNN 8.1 allows for flexible fusion of operators such as convolutions, point-wise operations, and reductions to speed up CNNs.
- Optimizations for Computer Vision and Speech: New optimizations are included for computer vision, speech, and natural language understanding networks.
TensorRT 7.2: High-Performance Deep Learning Inference
NVIDIA TensorRT is a platform for high-performance deep learning inference. The latest version, TensorRT 7.2, includes:
- New Debugging APIs: ONNX Graphsurgeon, Polygraphy, and PyTorch Quantization toolkit are added for better debugging capabilities.
- Support for Python 3.8: TensorRT 7.2 supports Python 3.8, ensuring compatibility with the latest Python versions.
- Bug Fixes and Documentation Upgrades: Several bug fixes and documentation upgrades are included for improved stability and usability.
Triton Inference Server 2.6: Simplified Production Deployment
Triton is an open-source inference serving software designed to maximize performance and simplify production deployment at scale. The latest version, Triton 2.6, includes:
- Alpha Version of Windows Build: Support for gRPC and TensorRT backend is added in the alpha version of the Windows build.
- Model Analyzer: An initial release of Model Analyzer, a tool that helps users select the optimal model configuration for maximum performance in Triton.
- Support for Ubuntu 20.04: Triton provides support for the latest version of Ubuntu, including additional security updates.
- Native Support in DeepStream: Triton on DeepStream can run inference on video analytics workflows on the edge or in the cloud with Kubernetes.
NGC Container Registry: Simplified Software Development and Deployment
NGC is the hub for GPU-optimized AI/ML/HPC application containers, models, and SDKs that simplifies software development and deployment. The latest updates include:
- NGC Catalog in AWS Marketplace: Users can now pull software directly from the AWS portal.
- Containers for Latest NVIDIA AI Software: Containers are available for the latest versions of NVIDIA AI software, including Triton Inference Server, TensorRT, and deep learning frameworks such as PyTorch.
DALI 0.30: Enhanced Data Loading and Augmentation
The NVIDIA Data Loading Library (DALI) is a portable, open-source GPU-accelerated library for decoding and augmenting images and videos to accelerate deep learning applications. The latest version, DALI 0.30, includes:
- New Functional API: An experimental feature offering an easy-to-use functional API.
- DALI Integration with Triton Inference Server: DALI pipelines can now be run within Triton on the server side to accelerate inference pipelines.
- New Jupyter Notebooks: Geometric Transform and Reductions notebooks are added for better learning and experimentation.
- Improved Operators for 3D/Volumetric Data and Video Processing: New and improved operators are included for 3D/volumetric data and video processing.
NVJPEG2000 0.1: GPU-Accelerated JPEG2000 Image Decoding
nvJPEG2000 is a new library for GPU-accelerated JPEG2000 image decoding. The latest version, nvJPEG2000 0.1, includes:
- Support for Linux and Windows Operating Systems: nvJPEG2000 supports both Linux and Windows operating systems.
- Faster Decoding: Up to 4x faster lossless decoding for 5-3 wavelet decoding and up to 7x faster lossy decoding for 9-7 wavelet transform.
- Support for Bitstreams with Multiple Tiles: Bitstreams with multiple tiles are now supported.
Conclusion
The latest updates to NVIDIA CUDA-X AI software bring significant enhancements to various libraries and tools, making it easier to build and deploy high-performance AI applications. With improvements in cuDNN, TensorRT, Triton Inference Server, NGC Container Registry, DALI, and nvJPEG2000, developers and researchers can leverage these updates to create more efficient and powerful AI applications. These advancements underscore NVIDIA’s commitment to advancing AI technology and providing robust tools for the AI community.