Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation

Summary

NVIDIA’s NIM microservices are designed to enhance speech and translation capabilities in applications. These microservices leverage NVIDIA Riva to provide automatic speech recognition (ASR), neural machine translation (NMT), and text-to-speech (TTS) functionalities. This article explores how developers can use these microservices to build customer service bots, interactive voice assistants, and multilingual content platforms with minimal development effort.

Voice Your Apps with NVIDIA NIM Microservices

NVIDIA NIM microservices are part of the NVIDIA AI Enterprise suite and offer advanced speech and translation features. These tools enable developers to self-host GPU-accelerated inferencing for both pretrained and customized AI models across clouds, data centers, and workstations.

Key Features of NIM Microservices

Automatic Speech Recognition (ASR): Transcribes spoken words into text.
Neural Machine Translation (NMT): Translates text from one language to another.
Text-to-Speech (TTS): Converts text into natural-sounding speech.

Interactive Browser Interface

The NVIDIA API catalog provides an interactive browser interface where users can perform basic inference tasks such as transcribing speech, translating text, and generating synthetic voices directly through their browsers. This feature offers a convenient starting point for exploring the capabilities of the speech and translation NIM microservices.

Running Microservices with NVIDIA Riva Python Clients

Developers can clone the nvidia-riva/python-clients GitHub repository and use provided scripts to run simple inference tasks on the NVIDIA API catalog Riva endpoint. An NVIDIA API key is required to access these commands.

Examples of Inference Tasks

Transcribing Audio Files: Users can transcribe audio files in streaming mode.
Translating Text: Text can be translated from English to other languages, such as German.
Generating Synthetic Speech: Text can be converted into speech and saved as an audio file.

Deploying Locally with Docker

For those with advanced NVIDIA data center GPUs, the microservices can be run locally using Docker. Detailed instructions are available for setting up ASR, NMT, and TTS services. An NGC API key is required to pull NIM microservices from NVIDIA’s container registry and run them on local systems.

Integrating with a RAG Pipeline

The microservices can be connected to a basic retrieval-augmented generation (RAG) pipeline. This setup enables users to upload documents into a knowledge base, ask questions verbally, and receive answers in synthesized voices.

Setting Up the RAG Pipeline

Environment Setup: Users need to set up the environment and launch the ASR and TTS NIMs.
Configuring the RAG Web App: The RAG web app needs to be configured to query large language models by text or voice.

Benefits of NIM Microservices

Scalability: The microservices can be deployed in various environments, from local workstations to cloud and data center infrastructures.
Flexibility: Developers can access the microservices through APIs and integrate them into their applications.
High-Performance AI Inference: The microservices are optimized for high-performance AI inference at scale with minimal development effort.

Getting Started with NIM Microservices

Developers interested in adding multilingual speech AI to their applications can start by exploring the speech NIM microservices. These tools offer a seamless way to integrate ASR, NMT, and TTS into various platforms, providing scalable, real-time voice services for a global audience.

Steps to Get Started

Explore the NVIDIA API Catalog: Visit the NVIDIA API catalog to learn more about the speech and translation NIM microservices.
Clone the NVIDIA Riva Python Clients Repository: Clone the nvidia-riva/python-clients GitHub repository to access scripts for running simple inference tasks.
Obtain an NVIDIA API Key: Get an NVIDIA API key to access the commands for running the microservices.
Run Inference Tasks: Use the provided scripts to run basic inference tasks such as transcribing speech, translating text, and generating synthetic voices.
Deploy Locally with Docker: For those with advanced NVIDIA data center GPUs, deploy the microservices locally using Docker.

Table: Key Features of NIM Microservices

Feature	Description
ASR	Transcribes spoken words into text.
NMT	Translates text from one language to another.
TTS	Converts text into natural-sounding speech.
Interactive Browser Interface	Allows users to perform basic inference tasks through their browsers.
Scalability	Can be deployed in various environments, from local workstations to cloud and data center infrastructures.
Flexibility	Developers can access the microservices through APIs and integrate them into their applications.
High-Performance AI Inference	Optimized for high-performance AI inference at scale with minimal development effort.

Table: Steps to Get Started with NIM Microservices

Step	Description
1. Explore the NVIDIA API Catalog	Visit the NVIDIA API catalog to learn more about the speech and translation NIM microservices.
2. Clone the NVIDIA Riva Python Clients Repository	Clone the nvidia-riva/python-clients GitHub repository to access scripts for running simple inference tasks.
3. Obtain an NVIDIA API Key	Get an NVIDIA API key to access the commands for running the microservices.
4. Run Inference Tasks	Use the provided scripts to run basic inference tasks such as transcribing speech, translating text, and generating synthetic voices.
5. Deploy Locally with Docker	For those with advanced NVIDIA data center GPUs, deploy the microservices locally using Docker.

Conclusion

NVIDIA NIM microservices offer a powerful tool for developers to enhance speech and translation capabilities in their applications. With the ability to self-host GPU-accelerated inferencing and integrate ASR, NMT, and TTS functionalities, these microservices provide a scalable and flexible solution for building customer service bots, interactive voice assistants, and multilingual content platforms. By following the steps outlined in this article, developers can quickly get started with using NIM microservices to voice-enable their applications.

Voice Your Apps with NVIDIA NIM Microservices#

Key Features of NIM Microservices#

Interactive Browser Interface#

Running Microservices with NVIDIA Riva Python Clients#

Examples of Inference Tasks#

Deploying Locally with Docker#

Integrating with a RAG Pipeline#

Setting Up the RAG Pipeline#

Benefits of NIM Microservices#

Getting Started with NIM Microservices#

Steps to Get Started#

Table: Key Features of NIM Microservices#

Table: Steps to Get Started with NIM Microservices#

Conclusion#