Linux Performance Tools at a Glance

Linux tools

There's a number of reasons to collect and analyze performance data for the workloads that clients are running on Linux (e.g., planning, information, optimization, resolving issues). Usually when optimizing a system, it's natural to collect performance data and compare it before and after making some changes. With performance issues, problems are coming up unplanned and you can't gather data right before a performance issue occurs. So what do you compare your data to?

Most people begin collecting data when they have a performance issue and try to figure out what the problem is by looking at the data. That makes it hard to find the reason of performance issues. To accurately solve a performance issue you need to compare information.

Performance analysis is done by comparing data of a system in a good versus bad state. This helps to find out what has changed, and the cause of the change.

Monitor Regularly

To prepare for future performance issues, set up regular monitoring and keep the data at least since the last major change, always having an example what a good case looks like.

For analyzing changes over time, degrading problems or finding out when a problem has occurred the first time, you need historical data (e.g., daily for the past week, weekly for the last month or monthly for the last year).

To do this, the SYSSTAT utilities package comes with the most important tool for monitoring, sar/sadc, that can be used for both data gathering and for regular monitoring. For starters you can collect performance data in 10-minute intervals.

An example for the usage of sadc is:

/usr/lib64/sa/sadc 600 144 outfile

The sampling rate—the rate at which data points are collected—needs to be high enough to spot a problem. For further analysis of a performance issue, it’s a good idea to collect data with a higher resolution once you know where to look.

Performance monitoring needs system resources so it can impact the system if your sampling rate is high. You don't want to gather too many samples in short intervals because it takes processor performance and you need to store the data. But gathering data is taking an average over time, which flattens peaks. Your sampling rate is defining your resolution in time. Events that occur on a time scale shorter than your sampling rate can't be seen. You need to set the sampling rate high enough to spot the peaks.

Create a description of your system and workload to determine if there’s problem. If you know what your system is supposed to look like and accomplish, you can identify deviations. To find out what’s going wrong, know what has changed by comparing actual performance data to historical performance data. Do so by having a data set of a healthy system. With this, you can quantify the problem.

It's best to be trained in gathering, processing and evaluating the data. So it's a good idea using the performance tool set you choose to process the data once in a while even if there's no problem. If there's a problem, time is precious and things are urgent. Work with the tools on the system before problems occur to learn how they work and what a normal state on your system looks like.

Find the Problem

Before you can start analyzing a performance issue, make a clear statement what the problem is. First, describe your performance issue properly. What indicators show the issue? What range is considered good and bad? Make a clear statement what the deviation is to expected or historical behavior.

Sometimes problems occur at specific times of day or they're triggered by certain events. That means you need to gather data during a specific timeframe. Some problems appear only a very short time. You need a sampling rate that’s high enough to see it. It's important to capture the bad state and a time frame as short as possible.

It's not enough to send in the monitoring data that somewhere contain the performance issue. Solving performance issues is like finding a needle in haystack and you don't know what that needle looks like. The less data you have the easier it is to analyze. More data is more likely to contain a shot (i.e., data of an occurrence) of the problem.

After the first analysis you might need to gather data a second time. In time you will gain experience that helps to gather better data. Have data for good cases at hand for comparison.

Analyze and Solve

The analysis starts on the data provided. It's depends on the problem what tools are best to use for analysis. The first analysis leads to a better understanding of the problem and we know better what to look for. That usually doesn't solve the problem but leads to a second round of gathering data.

Once the problem is solved it’s important to understand it to improve our preparation. This can be preventing the issue from occurring again by improving and monitoring the system, to reacting early to prevent the issue from occurring again or having better data available if it comes up again.

In performance analysis you work forward through the problem. You start be asking questions and create theses. These theses can be verified or falsified. Falsifying is usually a lot stronger because it can rule out a theses.

Answer the questions one at a time and try to narrow down the problem. What data do you need to answer that question? Believe in your data and your conclusions. Once a question is answered move on to the next. There's no reason to gather new data to answer the same question. There are only two reasons to gather new data: you have a new question or you discover you made a mistake.

A multi-staged approach—to narrow down the part of the system where the root cause is.—saves a lot of work and a lot of time. First use a general tool to isolate the area of the issue. Find the deviations from the good case in your data. Create theories as to how the observed data is produced. Verify or falsify your theories. Start out with the easy ones. Falsifying is usually stronger. Remember: Ravens are black. One more black raven proves nothing. But one white raven shatters the rule.

Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.

comments powered by Disqus



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

10 Years With Tux

10 Years With Tux

3,000 and Growing

IBM encourages easy ISV application porting to Linux on System z

Successes Mount for Linux on System z

Usage patterns evolve, increasing value to customers

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters