EasyManua.ls Logo

IBM Power 570 User Manual

IBM Power 570
142 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #117 background imageLoading...
Page #117 background image
Chapter 4. Continuous availability and manageability 103
Draft Document for Review September 2, 2008 5:05 pm4405ch04 Continuous availability and manageability.fm
L3 Array Protection
In addition to protection through ECC and Special Uncorrectable Error handling, the L3 cache
also incorporates technology to handle memory cell errors via a special cache line delete
algorithm. During system run-time, a correctable error is reported as a recoverable error to
the Service Processor. If an individual cache line reaches its predictive error threshold, it will
be dynamically deleted. The state of L3 cache line delete will be maintained in a deallocation
record, and will persist through future reboots. This ensures that cache lines varied offline by
the server will remain offline should the server be rebooted, and don’t need to be
rediscovered each time. These faulty lines cannot then cause system operational problems.
A POWER6 processor-based system can dynamically delete up to 14 L3 cache lines. Again,
it is not likely that deletion of a few cache lines will adversely affect server performance. If this
total is reached, the L3 is marked for persistent deconfiguration on subsequent system
reboots until repair.
While hardware scrubbing has been a feature in POWER main memory for many years,
POWER6 processor-based systems introduce a hardware-assisted L3 cache memory
scrubbing feature. All L3 cache memory is periodically addressed, and any address with an
ECC error is rewritten with the faulty data corrected. In this way, soft errors are automatically
removed from L3 cache memory, decreasing the chances of encountering multi-bit memory
errors.
4.2.4 PCI Error Recovery
IBM estimates that PCI adapters can account for a significant portion of the hardware based
errors on a large server. While servers that rely on boot-time diagnostics can identify failing
components to be replaced by hot-swap and reconfiguration, run time errors pose a more
significant problem.
PCI adapters are generally complex designs involving extensive on-board instruction
processing, often on embedded microcontrollers. They tend to use industry standard grade
components with less quality than other parts of the server. As a result, they may be more
likely to encounter internal microcode errors, and/or many of the hardware errors described
for the rest of the server.
The traditional means of handling these problems is through adapter internal error reporting
and recovery techniques in combination with operating system device driver management
and diagnostics. In some cases, an error in the adapter may cause transmission of bad data
on the PCI bus itself, resulting in a hardware detected parity error and causing a global
machine check interrupt, eventually requiring a system reboot to continue.
In 2001, IBM introduced a methodology that uses a combination of system firmware and
Extended Error Handling (EEH) device drivers that allows recovery from intermittent PCI bus
errors. This approach works by recovering and resetting the adapter, thereby initiating system
recovery for a permanent PCI bus error. Rather than failing immediately, the faulty device is
frozen and restarted, preventing a machine check. POWER6 technology extends this
capability to PCIe bus errors, and includes expanded Linux support for EEH as well.

Table of Contents

Question and Answer IconNeed help?

Do you have a question about the IBM Power 570 and is the answer not in the manual?

IBM Power 570 Specifications

General IconGeneral
BrandIBM
ModelPower 570
CategoryServer
LanguageEnglish

Summary

Chapter 1. General description

1.1 System specifications

Lists general system specifications including operating temperature, humidity, noise, and altitude.

1.2 Physical package

Details the physical attributes and dimensions of the CEC drawer building blocks.

1.3 System features

Outlines key features like core configurations, memory capacity, and disk drive support.

1.3.1 Processor card features

Describes processor card types, frequencies, cache, and Capacity on Demand (CoD) options.

1.3.2 Memory features

Details memory feature codes, capacities, frequencies, and population rules.

1.3.4 I/O drawers

Explains the types of I/O drawers, their slots, and connectivity options.

1.4 System racks

Covers rack compatibility, features, and installation considerations for the system.

1.4.1 IBM 7014 Model T00 rack

Describes the features and specifications of the 1.8-meter IBM 7014 Model T00 rack.

1.4.4 Intelligent Power Distribution Unit (iPDU)

Details the characteristics and function of the Intelligent Power Distribution Unit.

Chapter 2. Architecture and technical overview

2.1 The POWER6 processor

Explains the POWER6 processor's enhancements, core architecture, and advanced features.

2.1.1 Decimal floating point

Details the decimal floating-point processor's support for data types and instructions.

2.3 Processor cards

Describes the POWER6 processor cards, their layout, and memory interfaces.

2.4 Memory subsystem

Covers the memory controller, DIMM slots, and memory architecture.

2.4.1 Fully buffered DIMM

Explains the fully buffered DIMM technology for enhanced memory performance.

2.7 Integrated Virtual Ethernet adapter

Details the IVE adapter, its features, ports, and system integration.

2.8 PCI adapters

Discusses PCI and PCIe adapter types, slots, and general support.

2.8.1 LAN adapters

Lists available LAN adapters for connecting to a local area network.

2.8.3 iSCSI

Explains the iSCSI protocol for storage transport over IP networks.

2.9 Internal storage

Covers the internal disk subsystem using SAS interface and DASD backplane.

2.10 External I/O subsystems

Describes external I/O drawers like 7311-D11, 7311-D20, and 7314-G30.

2.10.1 7311 Model D11 I/O drawers

Details the 7311 Model D11 I/O drawer's features and slot configurations.

2.12 Hardware Management Console

Explains the HMC's role in managing system tasks and partitions.

Chapter 3. Virtualization

3.1 POWER Hypervisor

Introduces the POWER Hypervisor as a core component for system virtualization.

Virtual SCSI

Describes the virtual SCSI mechanism for storage virtualization using VIO Server.

Virtual Ethernet

Explains the virtual Ethernet switch function for secure inter-partition communication.

3.2 Logical partitioning

Discusses LPARs and virtualization for resource utilization and configuration.

3.2.2 Micro-Partitioning

Details Micro-Partitioning for allocating processor fractions to logical partitions.

3.3 PowerVM

Covers the PowerVM platform for industry-leading virtualization.

3.3.1 PowerVM editions

Outlines the functional elements of PowerVM Standard and Enterprise editions.

3.3.2 Virtual I/O Server

Explains the VIO Server's role in sharing physical resources among logical partitions.

3.3.4 PowerVM Live Partition Mobility

Describes moving running logical partitions between systems without disruption.

3.4 System Planning Tool

Explains the SPT for designing system configurations and planning partitions.

Chapter 4. Continuous availability and manageability

4.1 Reliability

Discusses the design principles for achieving high system reliability.

4.1.1 Designed for reliability

Covers design choices that reduce failure opportunities and improve reliability.

4.2 Availability

Details features that prevent unexpected application loss due to outages.

4.2.1 Detecting and deallocating failing components

Explains monitoring and deconfiguring faulty hardware to avoid system outages.

4.3 Serviceability

Outlines the strategy for efficient system service and repair.

4.3.1 Detecting errors

Covers the critical ability to accurately detect system errors.

4.3.2 Diagnosing problems

Explains how systems perform self-diagnosis using hardware and OS logic.

4.3.5 Locating and repairing the problem

Details methods for quickly identifying and replacing service parts.

4.5 Manageability

Covers functions and tools for efficient system management.

4.5.1 Service processor

Describes the service processor's role in monitoring, managing, and error detection.

4.5.6 IBM System p firmware maintenance

Explains the process of managing and installing microcode updates.

Related publications

IBM Redbooks

Lists IBM Redbooks relevant for detailed discussion of topics.

Online resources

Provides links to relevant IBM websites for further information.

Related product manuals