AIX > Administrator > Performance

PerfPMR Part 3: The Value of Tracing

kernel trace

This is the third installment in a four-part series on using PerfPMR. Read part one here 

and part two here. 

In parts one and two, we examined how to setup and run PerfPMR as a whole, and also looked at how to run individual scripts selectively. Most of the diagnostic scripts run by PerfPMR are based on the AIX kernel trace utility: this is what we will focus on here.

Kernel trace output is difficult to interpret and requires extensive experience to master, so in this context, I can only give you the basics. Fortunately, when it comes to making good sense of what trace data – and its extracts – are telling you, just knowing the basics is a great help.

The Shell Format

As with most of the scripts contained within PerfPMR, the one that kicks off the various traces is in shell format. This script is called, and like the other .sh scripts, paging through it reveals a host of execution options. As you dig deeper into, you'll realize how versatile it can be, as evidenced by the flags listed at the top of the file. In addition, you can tell to run different hooks (or events; more on that in a bit) than it does by default, sort hooks into groups, zero in on certain CPUs in your system or even use the script to extract some of the more useful performance reports like CURT and SPLAT. As with the other shell scripts, you're not limited to only running when there's a performance problem. For example, if you want to understand how an application runs on your system and interacts with hardware and software components, there's no greater aid to this understanding – short of a code review – than trace output.

PerfPMR runs numerous forms of the AIX trace: the kernel trace itself, lock traces, virtual CPU-preemption traces, several different system profilers, and other assorted utilities. Page through the output of a typical PerfPMR run, and you’ll see that these traces have been run on every logical processor in your system. PerfPMR runs these traces for 5 seconds each, and writes its output in binary format. Focusing on the kernel trace output (all the different types of trace data are written with similar filenames), we see our reports listed as trace.raw, trace.raw-0, trace.raw-1, etc. The numerals represent the LP the trace was run on. This output needs to be converted from binary to ASCII format, so in this example, we'll run the trcrpt converter on the trace.raw file:

trcrpt -Oexec=on,cpuid=on,pid=on,tid=on,svc=on trace.raw > trace.formatted

The trcrpt example above displays exec pathnames, the process and thread that ran on any given LP, the LP number these processes and threads ran upon, and the C programming language subroutine they invoked . All of the data is then written to the trace.formatted file. Page through this file to see every event that occurred on an LP for the 5 seconds the trace was run. Every line in a kernel trace displays its data the same way; events that occurred on LPs, the process and thread that were running, and so on. It’s the actual content of that data that varies from line to line. Let’s look at one such line and break down what’s happening:

200 ksh          0 11206728 13828219           0.001368154 0.000471         resume ksh    iar=100115E0 cpuid=00

Here we see what a Korn shell process was doing in the system, just after the trace was started. One of the most important pieces of information in kernel trace output is found in the first column on the far left. See where it says “ID,” and then the number below that (in this case, 200)? ID stands for an event – or in AIX speak, a hook – that occurred on a particular LP. Rather than spell out in text that a process was resumed, put to sleep or took any of dozens of different actions, we simply see a numerical identifier, which, practically, saves a lot of space and reading time in the trace. Beyond running the trace and formatting the resulting output, your first step in learning the ins and outs of kernel traces should be to learn the meanings of all of the available numerical identifiers. This is no small feat, given that there are nearly 700 such identifiers as of AIX version 7.1. But if you don't do this, you’ll never become competent at trace interpretation.

Again, our example identifier is 200. If you want to know the description of the event for which 200 is a label, run it through trcrpt, which is the command that converts a trace file from binary to ASCII. It also holds the key to deciphering the identifiers. Use the -j flag with trcrpt like this:

trcrpt -j|grep 200

The trailing 0 is dropped, revealing that a resume indeed correlates with what we see in the rest of the trace data.

I strongly suggest creating a table of all 700 trace hooks and their meanings. Then paste it into a text document and print it out. That way, when you plow through trace data, you’ll be able to translate event (hook) IDs rapidly. Create your event file like this:

trcrpt -j > trace_events

Now that we’ve created our event reference table, let’s have another look at our sample trace data:

200      ksh     0  11206728 13828219             0.001368154 0.000471        resume ksh iar=100115E0 cpuid=00

Traces are always read from left to right. The column headers are printed only once, at the top of the output. Every other line in the trace falls under these headers. So what’s happening here? We know from the trcrpt -j command that we’re dealing with a resume. Resumes happen when a process/thread that's previously been put to sleep is awoken to continue doing its work. The second column, PROCESS NAME, indicates the process/thread that's been resumed (in this case, a Korn shell process). The CPU column shows the logical processor on which this Korn shell process was resumed. The next two columns display the process and thread IDs of that Korn shell process. There's no data in the SYSTEM CALL column that would tell us the routine that was invoked to do the resume, so we move on. The ELAPSED_SEC column tells us how far into our 5-second trace this event occurred (0.001368154 seconds in) and the next column, DELTA_MSEC, tells us the time, in milliseconds, the resume took.

Almost Finished

We’re almost done. The output of the last few columns can be rolled into one concept: basically, they tell us in verbose terms what the prior columns hint at. In other words, the right-most columns provide a fuller description of what's in the left-hand columns. So our trace data breaks down like this: well within the first second, the kernel trace was run on CPU 0, and a Korn shell process with ID 11206728 spawned a thread with ID 13828219. This thread, which had been put to sleep, was resumed to once again do its work. The time it took to do this resume was a fraction of a millisecond. That’s it. Congratulations, you’ve just interpreted your first AIX kernel trace!

Now start paging through the rest of your trace data, keeping your event table handy. After awhile you’ll get a feel for what happened in your system when the trace was executed. When interpreting a trace, context is king. You need to look at each event – each line in a trace – within the context of every other line, because chances are most of the events in your trace influence every other event. When starting out though, it’s very helpful to zero in on a single process in your trace data and follow its actions from start to finish. Using the Korn shell process in our data as a sample, let’s grep out all of its activity:

cat trace.formatted | grep ksh > ksh_trace

The resulting ksh_trace file displays every Korn shell event that happened in the system during the PerfPMR run. You can do this with any process; similarly, you can sort on CPU IDs, system calls or any other column in your formatted trace report. But I recommend using the process name to start. Using the name makes contextualizing trace data much easier, because you’re focused on a single running program. A benefit of zeroing in on particular processes is you’ll be able to see exactly how any executable works and behaves. Do you want to know if an Oracle database does random or sequential I/O? Look at your trace data. Do you suspect your security software is hampering system performance? Confirm that in trace data. Are mallocs for network buffers being denied? Look at your trace data.

Mark J. Ray has been working with AIX for 23 years, 18 of which have been spent in performance. His mission is to make the diagnosis and remediation of the most difficult and complex performance issues easy to understand and implement. Mark can be reached at

Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.

comments powered by Disqus



2018 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.


AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters