EasyManua.ls Logo

Nvidia DGX H100 User Manual

Nvidia DGX H100
92 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #37 background imageLoading...
Page #37 background image
NVIDIA DGX H100 User Guide
4.7.2. Using the NVIDIA Container Runtime for Docker
If you need to use nvidia-docker2, install it using sudo apt install nvidia-docker2, then run:
sudo systemctl restart docker
The DGX OS also includes the NVIDIA Container Runtime for Docker (nvidia- docker2) which lets you
run GPU-accelerated containers in one of the following ways:
Use docker run and specify runtime=nvidia.
docker run --runtime=nvidia ...
Use nvidia-docker run.
nvidia-docker run ...
The nvidia-docker2 package provides backward compatibility with the previous nvidia-docker package,
so you can run GPU-accelerated containers using this command and the new runtime will be used.
Use docker run with nvidia as the default runtime.
You can set nvidia as the default runtime, for example, by adding the following line to the
etc∕docker∕daemon.json conguration le as the rst entry.
"default-runtime": "nvidia",
Here is an example of how the added line appears in the JSON le. Do not remove any pre-existing
content when making this change.
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "∕usr∕bin∕nvidia-container-runtime",
"args": []
}
}
}
You can then use docker run to run GPU-accelerated containers.
docker run ...
Caution: If you build Docker images while nvidia is set as the default runtime, make sure the
build scripts executed by the Dockerle specify the GPU architectures that the container will need.
Failure to do so might result in the container being optimized only for the GPU architecture on
which it was built. Instructions for specifying the GPU architecture depend on the application and
are beyond the scope of this document. Consult the specic application build process.
For more information, refer to the NVIDIA DGX OS 6 User Guide.
4.7. Running NGC Containers with GPU Support 31

Table of Contents

Other manuals for Nvidia DGX H100

Question and Answer IconNeed help?

Do you have a question about the Nvidia DGX H100 and is the answer not in the manual?

Nvidia DGX H100 Specifications

General IconGeneral
GPU8x NVIDIA H100 Tensor Core GPUs
GPU Memory640 GB HBM3 (80GB per GPU)
System Memory2 TB DDR5
Form Factor6U rackmount
Storage30TB NVMe SSD
Power Supply10kW
InterconnectNVLink 4.0

Summary

Introduction to the NVIDIA DGX H100 System

Hardware Overview

Details the components, specifications, and physical layout of the DGX H100 system.

Network Connections, Cables, and Adaptors

Explains network ports, modules, and supported cables for DGX H100 connectivity.

DGX OS Software

Lists the software components included in the DGX OS stack.

Connecting to the DGX H100

Connecting to the Console

Guides on connecting to the DGX H100 console via direct or remote methods.

SSH Connection to the OS

Describes how to establish an SSH connection to the DGX H100 operating system.

First Boot Setup

System Setup

Covers the initial system setup process after powering on or reimaging.

Post Setup Tasks

Outlines recommended tasks after the initial system setup, like updates.

Quickstart and Basic Operation

Installation and Configuration

Provides requirements and instructions for initial installation and configuration.

Turning DGX H100 On and Off

Provides instructions for safely powering the DGX H100 system on and off.

Verifying Functionality - Quick Health Check

Outlines how to use NVSM for system health checks and verification.

Running NGC Containers with GPU Support

Details methods for providing GPU support for Docker containers.

Managing CPU Mitigations

Discusses security updates for CPU vulnerabilities and performance impact.

SBIOS Settings

Accessing the SBIOS Setup

Guides users on how to access the system BIOS setup utility.

Configuring the Boot Order

Provides instructions for changing the system's boot order.

Using the Baseboard Management Controller (BMC)

Connecting to the BMC

Provides steps to connect to the DGX H100's BMC via a web browser.

Overview of BMC Controls

Details the primary controls available in the BMC web interface.

Changing the BMC Login Credentials

Guides on how to change BMC login credentials and manage users.

Using the Remote Console

Explains how to use the remote console (KVM) via the BMC.

Security

User Security Measures

Discusses user-level security practices for the DGX H100 system.

System Security Measures

Covers security measures incorporated into the NVIDIA DGX H100 system.

Secure Data Deletion

Explains how to securely erase data from DGX H100 system SSDs.

Redfish APIs Support

Supported Redfish Features

Lists the Redfish features supported by the DGX H100 system.

Redfish Examples

Provides examples of using Redfish APIs for system management.

Safety

Safety Information

General safety advice for installing and maintaining the DGX H100 server.

Safety Warnings and Cautions

Explains safety symbols and general warnings for personal injury.

Electrical Precautions

Details important electrical safety information for the DGX H100.

System Access Warnings

Advises on safety precautions when accessing the inside of the system.

Compliance

United States

Details FCC compliance for Class A digital devices in the US.

CE

Covers European Conformity (CE) and relevant directives.

Russia/Kazakhstan/Belarus

Details CU TR and FAC compliance for the region.

Third-Party License Notices

Micron msecli

Provides the license agreement terms for the Micron msecli utility.

Mellanox (OFED)

Outlines the terms and conditions for using Mellanox OFED software.

Notices

Notice

Contains general disclaimers, warranty information, and usage restrictions.

Trademarks

Lists NVIDIA and other company trademarks.

Related product manuals