DGX-2 System DU-09224-001 _v09|34
Chapter10. DIMM Replacement
10.1. DIMM Replacement Overview
This is a high-level overview of the procedure to replace a dual inline memory module (DIMMs)
on the DGX-2 System.
1. Use the nvsm show commands to identify the failed DIMM
2. Get a replacement DIMM from NVIDIA Enterprise Support.
3. Shut down the system.
4. Label all motherboard tray cables and unplug them.
5. Remove the motherboard tray and place on a solid flat surface.
6. Remove the motherboard tray lid.
7. Use the reference diagram on the lid of the motherboard tray to identify the failed DIMM.
8. Replace the bad DIMM with the new one.
9. Close the lid on the motherboard tray.
10.Insert the motherboard tray into the system.
11.Plug in all cables using the labels as a reference.
12.Power on the system.
13.Verify that all DIMMs are now healthy with nvsm.
10.2. Identifying the Failed DIMM
1. From the console, run the following nvsm command to identify memory alerts.
$ sudo nvsm show /systems/localhost/memory/alerts
Alerts will appear under the Target section. For example.
Targets:
alert0
2. Get specific information about the memory alert.
The following example obtains information for alert0.
$ sudo nvsm show /systems/localhost/memory/alerts/alert0