Recognizing the Efficiency Benefits of CPU Threading
This is part of an ongoing series on improving AIX performance by emphasizing CPU threading efficiency. The introductory article provides some useful background, so please read it if you have not already.
In the first article in this series, I offered that “attention must be paid to keeping L2/L3 cache content undiluted by configuring to maintain fewer virtual CPUs of different LPARs on a given CPU core.” For this installment, I'll illustrate this point with a quick true-to-life tactical case history.
Imagine a given POWER7/POWER8 system with four shared pool LPARs (SPLPARs). Each SPLPAR is configured with 2.00 CPU entitlement (or 2.0eCPU), eight virtual CPUs (or 8vCPUs) and 48GB RAM (default SMT-4 mode). Each SPLPAR also supports a database-on-AIX of the same batch-type workload with each accessing data in different LVM volume groups. Batch-type workloads are generally not thread response-time sensitive. That is, threads do not demand immediate time on-CPU as often. In contrast, online transaction processing (OLTP) workloads are thread response-time sensitive. OLTP workloads are generally comprised of threads demanding immediate time on-CPU.
After booting these four LPARs, the PowerVP utility definitively shows they are all assigned to share the same eight active CPU cores of the same POWER7/POWER8 physical CPU “chip” or “wafer” (aka, an SRAD) on a POWER7/POWER8 system with four SRADs. As well, PowerVP definitively shows all four SPLPARs are residing on DIMMs immediately adjacent to this SRAD (again, on a POWER7/POWER8 system with four SRADs). This is a common and realistic system configuration that is found throughout the IBM POWERverse:
LPAR 0: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 1: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 2: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 3: 2.0eCPU/8vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
I'll make note of one particular case, but keep in mind that, as an IBM performance specialist, I've dealt with literally hundreds of customers in this same situation. After 90 days of inexplicable performance inconsistencies and workload throughput concerns, I get an email requesting my attention. I was soon on a video chat viewing the customer's putty login sessions. I also provided them with some seemingly nonsensical recommendations that they implemented with much reluctance.
The performance issues abated, and in the 90 days since implementation, the customer hasn't had any new issues. So what did I tell them? Basically, I suggested making a few AIX performance-tuning changes. Then I also told them to do this:
LPAR 0: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 1: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 2: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
LPAR 3: 2.0eCPU/3vCPU/48gbRAM [SMT-4 mode] on REF1:0 SRAD:0 by lssrad -av
The old YMMV (your mileage may vary) disclaimer applies here. But it's true: reducing vCPUs―8vCPU to 3vCPU per LPAR in this case―can improve CPU efficiency and provide more consistent workload processing. Why? For these reasons:
Paraphrasing from the introductory piece, attention was paid to keeping L2/L3 cache content undiluted by configuring to maintain fewer vCPUs of different LPARs on a given set of CPU cores.
23vCPUs were more often running in SMT-2 or SMT-4 (a general dispatch of 2:1:1 and 4:1:1) versus 8vCPUs more often running in ST/SMT-1 (a general dispatch of 1:1:1). Configuring 3vCPUs per LPAR changed the thread dispatching, the customer's LPARs were no longer under-threaded with 8vCPUs.
Across all four LPARs, 3vCPUs were executing with 2.0eCPU versus 8vCPUs with 2.0eCPU. Said the other way, the workload of 1-of-3vCPUs was running beyond 2.0 eCPU versus the workload of 6-of-8vCPUs running beyond 2.0 eCPU.
Across all four LPARs, 3vCPUs showed lower AIX:vmstat:cpu:idle percentages versus 8vCPUs with higher AIX:vmstat:cpu:idle percentages.
Across all four LPARs, 3vCPUs are migrating to other SRADs less often versus 8vCPUs migrating to other SRADs more often.
Across all four LPARs, 3vCPUs are folding up and down less often versus 8vCPUs folding up&down more often.
Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.