RAS Differentiates IBM Power Systems Platform
Robustness as a core driver of the IBM Power Systems platform continues today—and the POWER9 processor further bolsters IBM’s position as a RAS leader.
By Sol Lederman10/01/2018
Nearly 50 years ago, IBM coined the term RAS—standing for reliability, accessibility and serviceability—to explain the strategy behind the dependability of the System/370 (bit.ly/2mnqRsd). Robustness as a core driver of the IBM Power Systems* platform continues today—and the POWER9* processor further bolsters IBM’s position as a RAS leader.
More Power in POWER9
IBM POWER9 processor-based servers can deliver up to 50 percent more performance, 4x the memory and twice the bandwidth of IBM POWER8* servers. IBM began introducing the POWER9 series late last year with the launch of the AC922 (ibm.co/2nyrH5i), a performance-optimized 2-socket compute server engineered for high performance computing, analytics, artificial intelligence (AI) and other data-intensive workloads. The POWER9 scale-out servers, introduced in February, integrate into a distributed cloud strategy (ibm.co/2vD3JKo). The third POWER9 offering, the E950 and E980 scale-up servers, completes the initial rollout (ibm.co/2nwl0R6).
In June, the U.S. Department of Energy’s Oak Ridge National Laboratory unveiled Summit, the world’s most powerful supercomputer (bit.ly/2l5q6Tn). Summit is an IBM AC922 system comprising 4,608 compute servers, each containing two 22-core IBM POWER9 processors, and more than 10 PB of memory paired with fast, high-bandwidth pathways for efficient data movement. Read more about the technology behind Summit.
Even for clients with serious but more modest computing needs—such as the 80 percent of Fortune 100 companies that have IBM Power Systems servers in their data centers—the POWER9 servers are worth a closer look.
At the heart of the POWER9 enterprise-class processor are several key features:
- High memory capacity for in-memory databases
- Up to 12 cores per socket
- Reduced latency and improved throughput via PCIe Gen4 and integrated NVMe bootable Flash support
- High-bandwidth (25 Gbps) links for GPU/OpenCAPI acceleration
- Improved SMP topology
- Dual memory controllers
- Up to eight threads per core
- Dynamic frequency, up to 4 GHz
George Gaylord, offering manager, IBM Power Systems Enterprise Servers, explains dynamic frequency. “The POWER9 processor offers a new EnergyScale option called Maximum Performance Mode, which enables each processor module to deliver the highest frequency possible based upon the characteristics of the workload and system utilization at any given moment. A Power E980 can deliver POWER9 processor speeds of up to 4 GHz,” he says.
More RAS in POWER9
Daniel Henderson, senior technical staff member, IBM Systems, wrote an in-depth whitepaper on RAS in POWER9 (ibm.co/2MfXoze). The whitepaper includes a detailed comparison of the RAS features of the various POWER9 systems. IBM RAS philosophy is guided by six principles (see Figure 1).
The POWER9 processor extends RAS in four key areas: hardware RAS including resilience against power failures, cloud features, advanced virtualization and security. The latter three are discussed later in this article. POWER9 systems include unique RAS in the processors as well as in memory.
Beyond those two components, RAS elements are designed to address serial failures, load capacity, wear-out, power, cooling, system clocks, the I/O subsystem and planned outages. Henderson’s aforementioned RAS whitepaper discusses these features in depth and also details which RAS features are delivered in which of the POWER9 servers.
Scale Out or Scale Up?
In some cases, business and software architecture considerations and concerns about reliability and robustness drive the decisions to scale out or to scale up. In other cases, the decision isn’t as clear-cut (or a mix of scale-out and scale-up servers makes the most sense). Scaling up, because it minimizes the number of moving parts involved in deploying numerous scale-out servers, becomes a more attractive option for many compute-intensive workloads. Moreover, scale-up servers generally offer better RAS because they contain fewer parts that could fail without a redundancy option to keep the system running.
Scaling out introduces overhead in distributing applications across servers, which affects overall performance. Add to that the human and software resources to manage multiple servers and the effort to secure more servers and scaling up is worth serious consideration. Even distributed computing applications can benefit from being concentrated in fewer servers, especially if those servers are robust and can be virtualized with fine granularity into many virtual servers.
“The POWER9 processor, from the ground up, is designed for multicloud infrastructures."—Ian Robinson, IBM Virtualization product manager
An IT organization may prefer to scale up but they may have the concern that doing so may leave them with the liability of excessive compute power that can strain their budgets. Capacity on Demand (CoD) and Power Enterprise Pool configuration options help allay that concern; they offer consumption-based infrastructure by supplying processors and memory dynamically as needed; clients can respond quickly to changing business and workload requirements while only paying for the capacity they need.
Regardless of which way an organization may choose to scale, one concern IBM clients don’t have is whether their POWER9 systems will interoperate in environments with older Power Systems servers. Gaylord explains, “One of the values of Power Systems to clients is that we make a variety of systems that all use a common base of chip technology and virtualization technology and that lets clients choose from among different systems, and they can be very confident that their applications can run across the whole entire family of portfolio products.”
According to Ian Robinson, IBM Virtualization product manager, cloud capabilities are key to integrating multiple systems. “The POWER9 processor, from the ground up, is designed for multicloud infrastructures. That includes firmware-based virtualization with secure mobility and IBM PowerVC* Cloud Manager, which gives you the UI and the front end of a private cloud on POWER*. All of this means that every workload deployed on POWER9 servers is fully cloud-enabled,” he says.
OS and virtualization security is central to RAS, but it’s easy to overlook in favor of RAS in processor, memory and other system components. But, securing servers is key to keeping them available.
The firmware-based IBM PowerVM* hypervisor delivers guest OS image protection. Chet Mehta, Distinguished Engineer, IBM Systems, explains how this security feature works (ibm.co/2nx9qVO). Central to this feature is that IBM builds all PowerVM firmware and cryptographically hashes and signs it with no third party involvement. Thus, there is no opportunity for malicious non-IBM firmware to be installed and loaded onto POWER hardware. Hypervisor code, which is in firmware, is also protected. Guest OS images are similarly protected using cryptographic measures. Strong cryptography ensures that AIX* and Linux* kernels and other OS components are all known and trusted.
Beyond OS image protection, POWER9 servers provide accelerated live VM mobility to securely and seamlessly move workloads between servers. “Every POWER9 processor includes built-in encryption and compression capabilities on the chip, so we use those to firstly compress and then encrypt a VM before moving it from one server to another, without downtime or disruption to users," Robinson says. "So you’re protecting the data in motion and you’re also sending a much smaller image, which, as a result, moves between servers much quicker.”
The POWER9 chip represents the latest evolution of a powerful processor and design architecture that continuously improves reliability, availability and serviceability while pushing the performance envelope. Not only does the POWER9 processor deliver more power and more RAS, it also delivers more security, binary compatibility between different generations of POWER to protect investment in software.
It also offers a more seamless cloud experience, and the peace of mind that comes from running a more scalable, flexible and integrated hardware environment and software stack versus piecing systems together with components from different vendors. And, with each passing generation of POWER, the economics are more appealing.
Sol Lederman is a freelance technology writer based in Santa Fe, New Mexico.More →