Don’t Fall Into the Internal Throughput Rate Trap
An examination of compute power, underlying performance and capacity fundamentals.
By Joe Temple05/08/2017
Many people can explain that the z Systems platform has the power to do things that other machines can’t. Others express proof of their belief that such assertions aren’t true by quoting benchmark results, or showing the power and scalability of aggregated clusters of smaller machines. This has been going on for a long time, and it’s largely because the participants fall into the Internal Throughput Rate (ITR) trap.
To effectively resolve the issue, we need to understand what is meant by compute power. This article will explain compute power by defining underlying performance and capacity fundamentals:
- External Throughput Rate (ETR)
- Response time
Mainframes, enterprise servers, cloud, converged and supercomputer infrastructures are described as possessing massive compute power. These IT solutions are massive when compared to cell phones, tablets, laptops or devices in the Internet of Things. However, to usefully describe this power, we need to quantify what these solutions can do beyond using descriptive language.
Inevitably, we ask “How much work can this thing do?” This is the question that leads to the notion of system throughput. The obvious and most simple quantification of power is to count cores, the processing elements of an IT solution. Those with massive compute power contain dozens, hundreds or even thousands of cores, thus earning its description.
Counting cores makes the bad assumption that all cores are created equal. By this logic, we could hypothetically calculate work per unit time by simply multiplying the core count by clock frequency. However, there are three problems with this. It assumes that:
- All cores do the same amount of work per clock cycle
- All cores execute one thread of work at a time
- N cores can do N times the work of one core
None of these are true. Luckily, the following definition of throughput avoids these pitfalls while retaining a relatively simple model related to machine characteristics.
Throughput Rate (TR) = Thread Count (TC) x Thread Speed (TS)
Using thread count rather than core count allows us to include modern multi-threaded cores. Using thread speed rather than clock rate allows us to account for compiler, hardware design and scaling effects, all of which clock rate ignores.
This model also has the advantage of lumping all of the adjustments into thread speed, for which proxy measurements can be readily found and successively refined. For example, measuring a single thread of transactions on one core establishes a baseline speed. Other cases involving more threads and/or more cores establish the effect of scaling.
There are many throughput metrics in use today, including, but not limited to, MIPS,“rPerfs, tpmCs, SAPs, SPECints and SPECfps. All of these are created by running performance tests in steady state— the the load is realtively constant , driving the machine at a nearly constant utilization—at high utilization. The tests are designed to avoid I/O bottlenecks so that utilization levels and/or throughput are not limited by I/O wait states.
Because of the lack of I/O impact on the metrics we call them ITR metrics. The ITR of a machine represents the rated capacity of the maximum amount of work the system can do, if we can sustain high levels of utilization.
Unfortunately, business processes don’t always achieve steady state and, unless we’re willing to use huge amounts of memory, I/O waits can’t always be eliminated. We need to understand more than the maximum rated capacity to fully understand performance.
There are three perspectives on performance that must be examined:
- The Vendor Perspective: How much work can the system do?
- The Provider Perspective: How much work can the users drive?
- The User Experience: How fast does each user’s work get done?
As stated above, the vendor perspective is represented by the rated capacity in the form of ITR metrics. Each vendor will naturally use a metric that’s favorable to its product. There are common metrics that attempt to allow cross platform comparisons, such as Gartner RPE2s and IDG QPIs. However, these metrics have all the same shortfalls of ITR metrics in that they don’t accurately represent user workloads. They suffer from an additional disadvantage in that they are averages or composites of measurements, which leads to issues with determining scalability. However, they do represent a quantified rated capacity of the machines.
The provider measures business performance in terms of dollars per day. The maximum instantaneous throughput rate is less important than the average over time. This is because business processes often fail to reach steady state, and on average, the system isn’t run at near 100 percent utilization. Also, I/O wait states occur at least in some loads, and must be accounted for. Thus, we define ETR as the rate that actual work can be pushed through the IT solution:
ETR = ITR x a Function of Utilization
A useful approximation is:
ETRavg = ITR x Uavg
This gives us insight into the average amount of work we can push through the system over time, which can more readily be related to Dollar/Day business metrics compared to the rated throughput, ITR.
The user experiences ITR if he or she can use all the threads supplied by the system in completing a unit of work. It’s more likely that the user experience is tied to Response Time, which is defined as:
Response Time (Tr) = Service Time (Ts) + Wait Time (Tw)
Service time can be near constant, or it can be a function of utilization. It’s bounded by:
1/ITR <= Ts <= 1/TS
At one extreme, the user can drive all the systems threads, and service time approaches 1/ITR; at the other extreme, the user can drive just one thread, and the service time approaches 1/TS.
Wait time is a function of service time, load variability and utilization. There’s an entire body of knowledge called queueing theory, and the models it produces show response sime versus utilization to follow hockey stick curves (see Figure 1). In this model, Response Time gets very large when the utilization approaches 100 percent.
Recall ETR is a function of utilization; the Response Time versus ETR curve will have a similar shape to one of these curves given the usage pattern of the load. Note that ETR can approach ITR only if the variability is low, otherwise attempting to push ETR toward ITR will result in poor response time.
The ITR Trap
Using ITR metrics to indicate machine performance is misleading, particularly when making assertions about the power of disparate infrastructures. We need to understand ETR and Response Time to make good decisions about IT solution design and deployment.
Usage patterns have a profound effect on the ETR and Response Time. Failure to take this into consideration can have severe consequences. IBM z, enterprise servers, cloud, converged and supercomputer Infrastructures have massive compute power, but they’re optimized in different ways. Even if we had a reliable Common Metric for ITR, the ETR and Response Time ratios among them would be very different than they would be if indicated by ITR alone.
ITR is being widely used to compare IT architectures. This includes comparison of enterprise servers to converged architecture and to various internal and external cloud infrastructures. The metrics are also used to compare Intel versus UNIX servers, and Arm versus Intel processors. The results of these ITR comparisons are misleading, and sometimes wrong. The real comparison depends on the work to be done, which very often is entirely misrepresented by the ITR metrics used. The misleading assertions are even affecting technology investment and tech stock prices.
This is the ITR trap. Don’t get caught in it.
Joe Temple is a retired IBM Distinguished Engineer and principle consultant of Low Country North Shore Consulting.
Sponsored ContentAchieve Compliance Without Impacting Productivity
Post a Comment
Note: Comments are moderated and will not appear until approvedcomments powered by Disqus