This content made possible by our sponsor. It is not written by or reflect the views of MSP TechMedia or IBM Systems Magazine.

John Baker

John Baker

Compuware (formerly MVS Solutions Inc.) - ThruPut Manager’

John has been analyzing and tuning MVS systems for over 20 years. He never trusts a computer he can lift.

LinkedIn Blog

Today, we seem to be reaching the upper limits of CPU power in terms of raw clock speed. The latest IBM z Systems* machine (z13*) has a lower clock speed than its predecessor, the EC12*—yet the z13 achieves a 12 percent average improvement in total capacity according to the Large System Performance Reference (LSPR). How is this possible?
The primary capacity bottleneck today is not in the capability of the CPU to execute instructions; it’s in the capability of the supporting infrastructure to keep the CPU supplied with a steady stream of data and instructions to process. This critical task falls to the CPU caches.
All modern CPU designs make use of multilevel caches in order to achieve this important goal. In the z13, the small but fast L1 and L2 caches are dedicated to each CPU core. Further out, there are larger L3, L4, and main memory caches that are shared among CPUs. In IBM z Systems architecture, these shared cache areas are referred to as the nest.
The z13 CPUs “spin” at a rate of 5 GHz—5 billion cycles per second. In traditional capacity planning terms, this could be referred to as 5,000 MIPS. However, your actual achieved MIPS rate largely depends on how many cycles are wasted waiting for data or instructions to be fetched from the processor caches. You likely only need a few cycles to fetch from the local L1 or L2 caches. The deeper and more often you need to go into the nest, the greater the cost in lost cycles. This measurement is known as Relative Nest Intensity (RNI).
How do you optimize access to processor caches? Many factors are simply beyond practical controls, however actions can be taken:
  • Study your SMF 113 records to understand your processor cache efficiency
  • Limit processor sharing among LPARs by setting PR/SM weights carefully and balancing logical and physical CPUs; in a Hiperdispatch context, try to avoid “Vertical Low” polarized CPs
  • Reuse data in caches by minimizing the number of diverse applications and limit concurrency in LPARs (automate batch initiators and CICS*/IMS* regions)
  • Automate input queues to control CPU utilization (to avoid processor overload)
It’s not about the engine; it’s about the fuel. If you’re not keeping the engine well fed, you’re just spinning your wheels.