NVIDIA DGX A100 DU-09821-001 _v01|76
Chapter12. Multi-Instance GPU
Multi-Instance GPU (MIG) is a new capability of the NVIDIA A100 GPU. MIG uses spatial
partitioning to carve the physical resources of an A100 GPU into up to seven independent
GPU instances. These instances run simultaneously, each with its own memory, cache, and
compute streaming multiprocessors. MIG enables the A100 GPU to deliver guaranteed quality
of service at up to 7X higher utilization compared to non-MIG enabled GPUs.
MIG enables the following:
‣
GPU memory isolation among parallel GPU workloads.
‣
Physical allocation of resources used by parallel GPU workloads.
Managing MIG instances is accomplished using the NVIDIA Management Library (NVML) APIs
or its command-line utility (nvidia-smi). Enablement of MIG requires a GPU reset and hence
some system services that manage GPUs should be terminated before enabling MIG.
To enable MIG on all eight GPUs in the system, issue the following.
1. Stop the NVSM and DCGM services.
$ sudo systemctl stop nvsm dcgm
2. Enable MIG on all eight GPUs.
$ sudo nvidia-smi -mig 1
If other services are running that prevent you from resetting the GPUs, then reboot the
system and skip the next step.
3. Restart the DCGM and NVSM services.
$ sudo systemctl start dcgm nvsm
To use MIG, see the MIG User Guide, which provides more detailed information about key
MIG concepts and deployment considerations and explains how to create MIG instances
and how to run Docker containers using MIG.