Unlocking the Power of Multi-GPU Support with NVTabular

Summary

NVTabular, a feature engineering and preprocessing library, revolutionizes the development of large-scale deep learning recommenders by offering multi-GPU support and new data loaders. This article delves into the core features of NVTabular, highlighting its ability to accelerate recommender workflows and provide significant speedups compared to traditional CPU-based processing.

Introduction

In the realm of deep learning recommenders, processing large datasets efficiently is crucial. NVIDIA’s NVTabular addresses this challenge by providing a high-level abstraction that accelerates computation on GPUs using the RAPIDS cuDF library. With its support for multi-node scaling and multi-GPU with DASK-CUDA and dask.distributed, NVTabular enables distributed parallelism, making it an indispensable tool for data scientists and machine learning engineers.

Core Features of NVTabular

Multi-GPU Support

NVTabular supports multi-GPU scaling with Dask-CUDA and dask.distributed. This allows users to deploy a cluster and connect to it to run the application, enabling distributed parallelism. The dask_cuda.LocalCUDACluster API is particularly useful for single machines with multiple GPUs, providing a convenient option for multi-GPU scaling.

Multi-Node Support

Similar to multi-GPU support, NVTabular also supports multi-node scaling. This involves starting a scheduler and workers, then running the NVTabular application with the Workflow initialized as described in the multi-GPU support section.

Multi-Hot Encoding and Pre-Existing Embeddings

NVTabular supports the processing of datasets with multi-hot categorical columns and the passing of continuous vector features like pre-trained embeddings. This includes basic preprocessing and feature engineering, as well as full support in the dataloaders for training models with both TensorFlow and PyTorch.

Shuffling Datasets

NVTabular allows for shuffling during dataset creation, creating a uniformly shuffled dataset that enables the dataloader to load large contiguous chunks of data already randomized across the entire dataset. This mechanism is critical for dealing with datasets that exceed CPU memory and require individual epoch shuffling during training.

Cloud Integration

NVTabular offers cloud integration with Amazon Web Services (AWS) and Google Cloud Platform (GCP), enabling users to build, train, and deploy models on the cloud using datasets.

Performance Benefits

NVTabular’s multi-GPU support provides significant speedups compared to traditional CPU-based processing. For example, using NVTabular on eight NVIDIA A100 GPUs can reduce processing time from 10 minutes to 1.9 minutes, a speedup of 5.3x. This performance advantage is crucial for large-scale deep learning recommenders.

Practical Use Cases

Switching Between Multi-GPU, Single GPU, and CPU

NVTabular allows for easy switching between multi-GPU, single GPU, and CPU with minimal changes to parameters. This flexibility is invaluable for developing locally on the CPU and then deploying the NVTabular workflow in the cloud on a multi-GPU cluster.

Deploying a Cluster

Deploying a cluster for Dask can be done in various ways. For a single machine with multiple GPUs, the dask_cuda.LocalCUDACluster API is typically the most convenient option. For multi-node scaling, starting a scheduler and workers, then running the NVTabular application with the Workflow initialized as described in the multi-GPU support section is necessary.

Table: Performance Comparison

Hardware Processing Time Speedup
1x A100 GPU 10 minutes -
8x A100 GPUs 1.9 minutes 5.3x
CPU Cluster 95x slower than 8x A100 GPUs -

Table: Key Features of NVTabular

Feature Description
Multi-GPU Support Enables distributed parallelism with Dask-CUDA and dask.distributed.
Multi-Node Support Supports multi-node scaling with Dask-CUDA and dask.distributed.
Multi-Hot Encoding Supports processing of datasets with multi-hot categorical columns.
Pre-Existing Embeddings Supports passing of continuous vector features like pre-trained embeddings.
Shuffling Datasets Allows for shuffling during dataset creation.
Cloud Integration Offers cloud integration with AWS and GCP.

Conclusion

NVTabular’s multi-GPU support and new data loaders significantly accelerate recommender workflows, providing crucial speedups compared to traditional CPU-based processing. With its flexibility in switching between multi-GPU, single GPU, and CPU, and its support for multi-node scaling and cloud integration, NVTabular is an indispensable tool for data scientists and machine learning engineers developing large-scale deep learning recommenders.