AIX > Administrator > Performance

The Power Hypervisor and the AIX Kernel Trace


In part one, available here, I told you about a simple form of the lparstat command that provides a detailed look at hypervisor activity. Now that you're familiar with lparstat -H and have a basic understanding of hypervisor calls, let's really drill down and examine AIX kernel traces. Kernel traces can be run in literally hundreds of ways, with dozens of flags available for use in nearly unlimited combinations.

Let's start with something basic. We need to generate—and then format—a kernel trace. So, working as root, enter this trace syntax at the command prompt of a busy LPAR:

	trace -a -fn -o trace.out -T40M -L80M ; sleep 5 ;trcstop

This form of the utility says to run trace asynchronously in the background, stop the collection of trace data as soon as the in-memory buffer has filled, add some detailed information about hardware, locks, symbols and other info to the output's header, and create a trace buffer of 40 MB with a log size of 80 MB. The trace will run for 5 seconds and then self-terminate. The trace.out file thus generated is recorded in a format that must be converted to ASCII to be readable; we do this with the trcrpt command:

	trcrpt -Oexec=on,pid=on,tid=on,cpuid=on,svc=on trace.out > trace.formatted

This form of the trace reporting command makes the following information visible in the formatted report: the exec pathname, process ID, thread ID, the logical CPU number a process and thread ran on, and the C language subroutine, if any, that executed in any particular event. It takes as input the raw trace.out file and writes the formatted trace.formatted file.

Now, call up the trace.formatted file in the editor of your choice. To search successfully on hypervisor functions, we must know their precise names. The general format of a hypervisor function is H_xxx, where the H denotes that the function is a hypervisor call. This is followed with an underscore and then the actual name of the function (which we introduced in part one). So cede in lparstat -H output becomes H_CEDE in the kernel trace. Likewise, prod becomes H_PROD, get_ppp becomes H_GET_PPP and so on for each hypervisor function. Each line in an AIX kernel trace that begins with an ID number (001, 104, 419, etc.) is called an event. All of this amounts to AIX-speak for "what happened on a particular logical CPU."

So let's search on one of the most common hypervisor events: a cede. As noted in part one, a cede is the hypervisor function that tells a virtual CPU with no useful work to do to enter a wait state and give its unused capacity to another virtual CPU. Here's an example trace entry of a cede. We sort our entry like this:

	/H_CEDE

This provides the following information:

492  wait           88      394391            0.000038531       0.000039                   
h_call: start H_CEDE iar=60497 p1=D7A64F447FAE6 p2=0058 p3=203 5CA7E4A0

Here, from left to right, we have an event ID (492), which is the initiation of a hypervisor call, calling the wait routine for logical CPU #88, on which a thread with the ID of 394391 was running. Some times are given for how far into our 5-second trace this event occurred as well as how long it took, respectively. The first timing is in seconds while the second is milliseconds. We then see the hypervisor call detail. The most important information here is the event itself and how long it took. Here's another trace entry, detailing an H_PROD, or an instance where the hypervisor makes a virtual CPU runnable:

492  wait           17  7208969                   0.000323642       0.000219                   
h_call: start H_PROD iar=891B4 p1=0058 p2=00FF p3=F1000A5C5016

Here's one more entry, where the partition's performance information is returned by the lparstat command:

492  lparstat       12  16253645   lpar_get_inf   0.429847138       0.000062                   
h_call: start H_GET_PPP iar=1FF330 p1=F00000002FF45DA0 p2=3F31

Grep out each hypervisor call individually and run that grep through word count to see how many times it occurs in your 5-second trace. Do this on several systems with similar workloads (database, middleware or application servers, for example). Get a feel for the frequency and duration of each call; also look at the events immediately prior to and after the hypervisor function on the same logical CPU.

Helpful as it is, the lparstat -H command takes you only so far in understanding hypervisor activity. To truly learn about what the hypervisor does, look at trace data and put each call in the context of where it's happening in the report. Don't worry if you haven't had any exposure to AIX kernel trace interpretation. A lot of it is intuitive and self-explanatory. Learning these explanations simply takes time.

In future articles, I'll give more pointers on trace interpretation. For now, take the list of hypervisor calls that show up in an lparstat -H, prepend them with an H_ and capitalize everything. Now go searching through your trace data for each. Keep googling and do it a few dozen times. With a little effort, you'll gain considerable insight into how the hypervisor functions in your environment.

Mark J. Ray has been working with AIX for 23 years, 18 of which have been spent in performance. His mission is to make the diagnosis and remediation of the most difficult and complex performance issues easy to understand and implement. Mark can be reached at mjray@optonline.net



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2018 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.

AIX > ADMINISTRATOR > PERFORMANCE

AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters
not mf or hp