Diagnosing and Tuning with PerfPMR: Setup and Execution
A look at different methods of running PerfPMR and what you can learn about the data it produces.
By Mark J. Ray08/10/2016
Everyone uses PerfPMR, a suite of performance and configuration diagnostic programs. But do you really how to use PerfPMR? While there are dozens of different ways to run all, or parts of, PerfPMR, there's little documentation that explains how to do all the things you can do. In this four-part series of articles, you'll learn how to get the most from this comprehensive tool.
If you've ever opened a PMR with IBM for a performance issue, it’s likely you have been asked to run PerfPMR for diagnosis. Once you upload the data to IBM, performance experts analyze it and make remediation suggestions based on the data it presents.
This passive method of PerfPMR usage works well for many busy administrators. However, plenty of admins would like to explore different methods of running PerfPMR and learn about the data it produces.
Download and Setup
So you’ve discovered a performance problem on one of your AIX systems. Maybe a database is running slowly, or an application has ceased communicating with users. Or perhaps you’ve noticed your CPU usage has climbed precipitously overnight for no obvious reason, or network traffic has slowed to a crawl. You try a few familiar tricks, but they don't eliminate the problem. What now? Try the PerfPMR utility.
But before actually downloading PerfPMR and running it on your suspect system(s), some prep work is in order. For starters, placement matters. You can’t – or shouldn’t – install the PerfPMR executables just anywhere. I always tell customers to stay away from / (the root filesystem) and /var, for the obvious problems that may emerge if these filesystems are filled to capacity with PerfPMR data. Also, do not install and run PerfPMR into an NFS-mounted filesystem. Doing so results in confused data from both the NFS server and client, and you certainly don’t want that. I recommend installing PerfPMR either into /tmp or /home; these locations typically contain enough space to run PerfPMR, plus they're easy to remember.
Let’s say you’ve settled on /tmp for your PerfPMR installation. In the /tmp directory, create two subdirectories called “PerfPMR” and “perfdata,” respectively. cd into the /tmp/PerfPMR directory. Then, using either a web browser or FTP from the command line, download the PerfPMR archive along with the README into the /tmp/PerfPMR directory. For convenience, here's the FTP link from within a browser:
Once you arrive at this URL, you'll find many folders. Each holds a PerfPMR version that's tailored for a specific AIX release -- from the current v7.2 and all the way back to v3.2. Go into the folder that corresponds to the AIX version your problem system is running and download both the README and tar archive of PerfPMR.
Next, follow the unpacking and installation instructions in the README; they’ll take all of a minute to execute. When you complete the installation, you’ll have about 50 distinct executables in your /tmp/PerfPMR directory (including shell and awk scripts with a few C routines tossed in). A link will also have been created to /usr/bin, pointing back to the perfpmr.sh file in your /tmp/PerfPMR directory. This is the “master control” file, so to speak, for a typical PerfPMR run. Scan the contents of this directory and familiarize yourself with the files it contains. We’ll cover some of these in future articles.
The first thing you need to check is whether you have the space to store all of the reports PerfPMR will generate; at minimum, there will be a few dozen. PerfPMR cannot be run from its installation directory, so cd into your /tmp/perfdata directory and issue this command:
This produces a screen full of information. In the last two lines of this text, you’ll see something like this:
PERFPMR: disk space needed is at least : <490> Mbytes
PERFPMR: free space in this directory : <4112> Mbytes
If you have enough free space to accommodate the PerfPMR data, you’re good to go. This step is very important because the last thing you want is to have your PerfPMR run aborted because you don't have enough space for the data.
Now that you’ve installed PerfPMR and verified you have the space needed for all of its reports, you're ready to run the utility. Running it in default (the preferred option, at least initially) is easily accomplished – as the root user – with this command:
perfpmr.sh 600 <enter>
This tells PerfPMR to run for approximately 600 seconds, or 10 minutes . (Note: In this instance, it isn't necessary to enter the number 600; PerfPMR will still run for that amount of time.) I say approximately because the actual length of time PerfPMR runs is wholly dependent on the number of logical processors in your system. PerfPMR runs several different types of traces for 5 seconds each on every LP, including an AIX kernel trace, a lock trace, and a virtual CPU pre-emption trace. In addition, PerfPMR also runs a pprof against every LP. You can see that if you have a POWER7 system with 20 CPUs that each contain four logical processors, your total PerfPMR runtime will far exceed ten minutes. To adjust the time allotted for your PerfPMR run, vary the numeral appended to the perfpmr.sh invocation. Aside from all the traces, PerfPMR runs many other types of system monitors; it's these runtimes that can be adjusted with the timing number. So if you want the monitors to run for more time and thus capture more data (say, 20 minutes) raise it to 1200. You can also go for a quicker run; the minimum setting is 60 (60 seconds).
Now let's look at some of the flags you can use to run PerfPMR. Yes, there are flags. Not a lot of users are aware of them, but these flags are extremely useful. They allow you to customize PerfPMR’s output, and -- even more importantly -- tell it when to run. Here are some of my favorite flags:
-WIn my opinion, this is the most useful PerfPMR flag. The -W flag lets you wait until a particular program has begun (it enters the process table with a valid PID) to run PerfPMR. This is important because if PerfPMR isn't running while a performance issue is occurring, it won't capture the conditions that are causing the issue. Since a good 30 percent of all the performance calls I get involve diagnosing network connection problems, let’s use that as an example. Say you have an FTP connection that times out or fails completely and you want to collect diagnostic data so you can identify the problem. Issue an FTP command to the server used for downloading PerfPMR, and set up PerfPMR to start its data collections only when that connection is attempted. Here's the syntax (multiple words and spaces are always enclosed in quotes):
perfpmr.sh -W "/usr/bin/ftp"
Hit the enter key, and you'll see PerfPMR try to start, but then go into a holding pattern with this message:
PERFPMR: waiting for <usr/bin/ftp> to be in the process table
Now start another terminal session. Try and connect to the download server using FTP from the command line:
PerfPMR will wait until the connection is attempted, and then start its data collections in our first terminal window:
17:59:50-07/21/16 : wait for process completed
PERFPMR: Parameters passed to perfpmr.sh: -W /usr/bin/ftp
PERFPMR: Data collection started in foreground (renice -n -20)
This is pure gold. Again, PerfPMR data is useless unless it captures the specific performance issue you’re trying to diagnose. The -W flag makes easy work of running PerfPMR only at the times you need to. In this way, not just network connections, but database starts and stops, application processing and even backup and restore failures can be diagnosed effectively.
-dThe -d flag defers starting PerfPMR for the specified number of seconds. The syntax would be something like “perfpmr.sh -d 60”, where 60 is the number of seconds PerfPMR waits before starting. You can choose any value that affords you the time to setup and start any database, application or utility that's giving you problems. Likewise, -d gives you time to stop any executable – or the system itself, for that matter – so you can catch problems that occur during shutdown.
-cIf you don’t need to gather configuration information for your system, use the -c flag (specifically, perfpmr.sh -c). Very often, a prtconf will be all that you need for reference in how your system is put together. If you feel this is sufficient, there's no need to collect all of the configuration files PerfPMR generates.
-nDon’t gather netstat or nfsstat data.
-QDon’t gather lsattr, lslv or lspv data. These commands can take a lot of time to run, so use the -Q flag if you’re certain you don’t need the data they produce.
The lower case -p flag says not to gather pprof data during a PerfPMR run. pprof is a utility that reports on CPU usage by kernel threads. I usually get all the kernel thread/CPU information I need from a CURT report, so I’ll routinely use the -p flag like this: perfpmr.sh -p 600.
There are lots more PerfPMR flags. Want to see them all? Just page through the perfpmr.sh script itself in your PerfPMR installation directory; they're listed near the top. These flags can make PerfPMR far more useful than its vanilla invocation.
Speaking of scripts, do a long listing in the directory into which you installed the PerfPMR executables (e.g., /tmp/PerfPMR). See all the files with the .sh extension? That means the file is a shell script that can be run independently of the full PerfPMR suite. By itself, the capability to run portions of PerfPMR is extremely valuable when you want to focus just on one area of your system’s configuration or performance.
In next month’s installment, I’ll show you how to run these scripts and interpret their data.