New Stable Diffusion Models Accelerated with NVIDIA TensorRT

Unlocking Faster Image Generation with NVIDIA TensorRT

Summary: NVIDIA TensorRT is revolutionizing the way we generate images with AI. By leveraging TensorRT, developers can significantly accelerate the performance of Stable Diffusion models, enabling real-time image generation and saving precious time in workflows. This article explores how TensorRT boosts the efficiency and speed of Stable Diffusion, making it indispensable for real-time applications and resource-intensive tasks.

The Power of TensorRT

TensorRT is a high-performance deep learning inference optimizer that excels at parallelized work, crucial for running generative AI models. It provides layer fusion, precision calibration, kernel auto-tuning, and other capabilities that significantly boost the efficiency and speed of deep learning models. This makes it indispensable for real-time applications and resource-intensive tasks like Stable Diffusion.

How TensorRT Accelerates Stable Diffusion

Stable Diffusion is an open-source generative AI image-based model that enables users to generate images with simple text descriptions. The most popular distribution is the Automatic 1111 Stable Diffusion Web UI. By integrating TensorRT into this UI, developers can double the performance of the model, enabling faster image generation.

TensorRT Extension for Stable Diffusion Web UI: The TensorRT extension for Stable Diffusion Web UI boosts performance by up to two times, significantly streamlining Stable Diffusion workflows. This extension also supports ControlNets, tools that give users more control to refine generative outputs by adding other images as guidance.
Real-World Performance: Internal tests using the UL Procyon AI Image Generation benchmark have shown that TensorRT delivers speedups of 50% on a GeForce RTX 4080 SUPER GPU compared with the fastest non-TensorRT implementation.

New Stable Diffusion Models Accelerated with TensorRT

At CES, NVIDIA shared that SDXL Turbo, LCM-LoRA, and Stable Video Diffusion are all being accelerated by NVIDIA TensorRT. These enhancements allow GeForce RTX GPU owners to generate images in real-time and save minutes generating videos, vastly improving workflows.

SDXL Turbo

SDXL Turbo achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation. NVIDIA hardware, accelerated by Tensor Cores and TensorRT, can produce up to four images per second, giving you access to real-time SDXL image generation for the first time ever.

LCM-LoRA

Low-Rank Adaptation (LoRA) is a training technique for fine-tuning Stable Diffusion models. Combined with the latent consistency model (LCM), a LoRA checkpoint enables you to drastically reduce the number of sampling steps needed to produce a Stable Diffusion image. This improves speed dramatically at the cost of an image quality hit. LCM-LoRA can run approximately nine times faster because it uses only four steps (compared to 50 steps traditionally) and is accelerated by TensorRT optimizations.

Stable Video Diffusion

Stable Video Diffusion by Stability AI is their first foundation model for generative video based on the image model Stable Diffusion. Stable Video Diffusion runs up to 40 times faster with TensorRT, potentially saving up to minutes per generation.

Getting Started with Stable Diffusion

To download the Stable Diffusion Web UI TensorRT extension, visit the NVIDIA/Stable-Diffusion-WebUI-TensorRT GitHub repo. The newly released update to this extension includes TensorRT acceleration for SDXL, SDXL Turbo, and LCM-LoRA.

Table: Key Features of TensorRT Acceleration

Model	Acceleration	Key Benefit
SDXL Turbo	Up to 4 images per second	Real-time image generation
LCM-LoRA	Approximately 9 times faster	Reduced sampling steps
Stable Video Diffusion	Up to 40 times faster	Faster video generation

Conclusion

NVIDIA TensorRT is a game-changer for Stable Diffusion workflows, offering significant performance boosts that enable real-time image generation and save precious time in workflows. By leveraging TensorRT, developers can unlock the full potential of Stable Diffusion models, making it indispensable for real-time applications and resource-intensive tasks. Whether you’re working with SDXL Turbo, LCM-LoRA, or Stable Video Diffusion, TensorRT is the key to unlocking faster and more efficient AI image generation.

Unlocking Faster Image Generation with NVIDIA TensorRT#

The Power of TensorRT#

How TensorRT Accelerates Stable Diffusion#

New Stable Diffusion Models Accelerated with TensorRT#

SDXL Turbo#

LCM-LoRA#

Stable Video Diffusion#

Getting Started with Stable Diffusion#

Table: Key Features of TensorRT Acceleration#

Conclusion#