MLOps Made Simple & Cost Effective with GKE and NVIDIA A100 MIG

Simplifying MLOps with Google Kubernetes Engine and NVIDIA A100 Multi-Instance GPUs

Summary

Google Cloud and NVIDIA have collaborated to make Machine Learning Operations (MLOps) simpler, more powerful, and cost-effective by integrating Google Kubernetes Engine (GKE) with NVIDIA A100 Multi-Instance GPUs. This partnership enables the dynamic scaling of end-to-end ML pipelines with the right-sized GPU acceleration, maximizing infrastructure utilization and minimizing operational costs.

The Power of GKE and NVIDIA A100 MIG

Google Kubernetes Engine (GKE) now supports the Multi-Instance GPU (MIG) feature on NVIDIA A100 Tensor Core GPUs. This feature allows each A100 GPU to be partitioned into up to seven independent GPU instances, each with its own high-bandwidth memory, cache, and compute cores. GKE can then provision GPU resources for workloads with greater granularity, share a single GPU for multi-user, multi-model use-cases, and automatically scale up or down based on changing needs of ML pipelines.

Key Benefits

Portability and Scalability: GKE with MIG enables the efficient use of GPU resources, allowing multiple models to be executed simultaneously on independent MIG partitions within a single A100 GPU.
Productivity: The integration of GKE with NVIDIA A100 MIG simplifies the management of ML pipelines, reducing the complexity of infrastructure management and development challenges.
Cost-Effectiveness: By dynamically scaling GPU resources, organizations can maximize infrastructure utilization and minimize operational costs.

How It Works

MIG Feature: The MIG feature on NVIDIA A100 GPUs allows for the partitioning of a single GPU into multiple independent instances, each capable of running separate workloads.
GKE Integration: GKE can provision and manage these MIG instances, ensuring efficient use of GPU resources and automatic scaling based on workload demands.
NVIDIA Solution Stack: The combination of GKE’s managed Kubernetes services with NVIDIA’s GPU-optimized solution stack accelerates ML pipelines, addressing both development and infrastructure management challenges.

Example Use Case

A practical example of this technology in action can be seen in the GTC21 Session, “Gain Competitive Advantage using MLOps: Kubeflow and NVIDIA Merlin and Google Cloud.” This session demonstrates how GKE, NVIDIA A100 MIG, and NVIDIA’s GPU-optimized solution stack can be used to build and deploy an end-to-end recommender system.

Technical Requirements

GKE Version: GKE version 1.18.6-gke.3504 or higher is required to support A100 GPUs.
GPU Quota: A GPU quota must be set up in the relevant Compute Engine zone.
NVIDIA GPU Drivers: The relevant NVIDIA drivers must be installed on all nodes that need to run GPUs.

MIG Strategies

Kubernetes support for the MIG feature on A100 GPUs comes with three strategies: none, single, or mixed. These strategies allow for the flexible management of MIG devices, ensuring compatibility with existing job scripts and minimizing changes to well-known Kubernetes interfaces.

Additional Resources

For more information on using GPUs with GKE and the benefits of MIG, refer to the following resources:

NVIDIA Multi-Instance GPU User Guide: Provides detailed information on MIG management and deep learning use cases.
Getting the Most Out of the NVIDIA A100 GPU with MIG: Offers insights into maximizing the utilization of A100 GPUs with MIG.
Google Cloud’s GPU Instances: Provides information on the new A2 instance and how to leverage NVIDIA A100 GPUs on Google Cloud.

Technical Specifications

Specification	Detail
GKE Version	1.18.6-gke.3504 or higher
GPU Types	NVIDIA A100, Tesla K80, P4, V100, P100, T4
MIG Instances	Up to 7 independent GPU instances per A100 GPU
NVIDIA Drivers	Required for all nodes running GPUs
GPU Quota	Must be set up in the relevant Compute Engine zone

Frequently Asked Questions

What is the MIG feature on NVIDIA A100 GPUs?
- The MIG feature allows a single A100 GPU to be partitioned into up to seven independent GPU instances, each capable of running separate workloads.
How does GKE integrate with NVIDIA A100 MIG?
- GKE can provision and manage MIG instances, ensuring efficient use of GPU resources and automatic scaling based on workload demands.
What are the benefits of using GKE with NVIDIA A100 MIG?
- The integration simplifies MLOps, provides portability and scalability, enhances productivity, and offers cost-effectiveness by maximizing infrastructure utilization and minimizing operational costs.

Conclusion

The integration of Google Kubernetes Engine with NVIDIA A100 Multi-Instance GPUs revolutionizes MLOps by providing a simple, powerful, and cost-effective solution for building, serving, and dynamically scaling end-to-end ML pipelines. By leveraging the MIG feature, organizations can maximize GPU utilization, reduce operational costs, and focus on delivering the best value to their end customers.

Simplifying MLOps with Google Kubernetes Engine and NVIDIA A100 Multi-Instance GPUs#

Summary#

The Power of GKE and NVIDIA A100 MIG#

Key Benefits#

How It Works#

Example Use Case#

Technical Requirements#

MIG Strategies#

Additional Resources#

Technical Specifications#

Frequently Asked Questions#

Conclusion#