To Next Page

To Previous Page

Chapter 4. Continuous availability and manageability 105

Draft Document for Review September 2, 2008 5:05 pm4405ch04 Continuous availability and manageability.fm

4.3.1 Detecting errors

The first and most crucial component of a solid serviceability strategy is the ability to

accurately and effectively detect errors when they occur. While not all errors are a guaranteed

threat to system availability, those that go undetected can cause problems because the

system does not have the opportunity to evaluate and act if necessary. POWER6 processor-

based systems employ System z server inspired error detection mechanisms that extend

from processor cores and memory to power supplies and hard drives.

Service Processor

The Service Processor is a separately powered microprocessor, separate from the main

instruction-processing complex. The Service Processor enables POWER Hypervisor and

Hardware Management Console surveillance, selected remote power control, environmental

monitoring, reset and boot features, remote maintenance and diagnostic activities, including

console mirroring. On systems without a Hardware Management Console, the Service

Processor can place calls to report surveillance failures with the POWER Hypervisor, critical

environmental faults, and critical processing faults even when the main processing unit is

inoperable. The Service Processor provides services common to modern computers such as:

򐂰 Environmental monitoring

– The Service Processor monitors the server’s built-in temperature sensors, sending

instructions to the system fans to increase rotational speed when the ambient

temperature is above the normal operating range.

– Using an architected operating system interface, the Service Processor notifies the

operating system of potential environmental related problems (for example, air

conditioning and air circulation around the system) so that the system administrator

can take appropriate corrective actions before a critical failure threshold is reached.

– The Service Processor can also post a warning and initiate an orderly system

shutdown for a variety of other conditions:

• When the operating temperature exceeds the critical level (for example failure of air

conditioning or air circulation around the system)

• When the system fan speed is out of operational specification, for example, due to a

fan failure, the system can increase speed on the redundant fans in order to

compensate this failure or take other actions

• When the server input voltages are out of operational specification.

򐂰 Mutual Surveillance

– The Service Processor monitors the operation of the POWER Hypervisor firmware

during the boot process and watches for loss of control during system operation. It also

allows the POWER Hypervisor to monitor Service Processor activity. The Service

Processor can take appropriate action, including calling for service, when it detects the

POWER Hypervisor firmware has lost control. Likewise, the POWER Hypervisor can

request a Service Processor repair action if necessary.

򐂰 Availability

– The auto-restart (reboot) option, when enabled, can reboot the system automatically

following an unrecoverable firmware error, firmware hang, hardware failure, or

environmentally induced (AC power) failure.

򐂰 Fault Monitoring

– BIST (built-in self-test) checks processor, L3 cache, memory, and associated hardware

required for proper booting of the operating system, when the system is powered on at

the initial install or after a hardware configuration change (e.g., an upgrade). If a

Brand	IBM
Model	Power 570
Category	Server
Language	English

IBM Power 570 User Manual

Table of Contents

Questions and Answers:

IBM Power 570 Specifications

Related product manuals