SPLAT: The Simple Performance Lock Analysis Tool
I don’t know about you, but locks give me a headache. The way locking activity is implemented and the myriad types of “stuff” locks do and affect is, quite simply, mind-boggling.
However, locks are something we'll likely always have to deal with, which makes understanding locking activity and how it affects the performance of databases, applications and the operating system itself an essential part of every AIX specialist's diagnostic arsenal. Fortunately, the complex topic of locking can be boiled down to a few easy to understand concepts.
So what is a lock? In computer programming, a lock is a synchronization mechanism. Locks enforce limits on access to shared resources when lots of threads are seeking that access. Think about a database. The whole reason any database exists, basically, is so data can be read from or written to it. Now what happens when you have a whole bunch of working threads that simultaneously require read or write access to your database? That's simple: Without a robust locking mechanism, your database would crash a lot, to the point where you'd have a hard time getting any useful data from it. Locks allow some threads to access data while making the others wait for that access. Locks serialize access to data in an orderly fashion, parsing out access times and types proper to the type of thread making the request.
As you read on, please understand that I'm purposefully omitting a great deal of information, mostly because a course on programmatic locking is beyond the scope of this article. Also, I want to do something different. You've no doubt read a few locking articles previously. They typically list dozens of terms, but too often fail to explain any of them. With this in mind, I’ll share the basics of implementing the Simple Performance Lock Analysis Tool, or SPLAT. It's really the only game in town when you want to examine locking activity in your AIX system. To present its information, SPLAT uses only about a dozen different terms. And this is a good thing. Once you get the hang of these oft-repeated terms, you’ll be able to quickly do a basic read of lock activity in any AIX system.
Examining Lock Data
In most operating systems, locking can be categorized into two main types: simple and complex. There are sub-types and hybrids, but these are the two you will see the most. Complex locks are read/write locks that protect shared resources like data structures, peripheral devices and network connections. These shared structures are commonly referred to as critical sections, and bad things can happen if strict rules about accessing critical sections aren’t observed. Complex locks are "pre-emptible" and non-recursive by default, meaning they cannot be acquired in exclusive-write mode multiple times by a single thread. However, complex locks can become recursive in AIX by way of the lock_set_recursive kernel service. Complex locks are not spin locks – meaning they will not enter into a loop and wait forever for a lock to be acquired. At some point, they will be put to sleep. Then we have the simple lock. Simple locks are exclusive-write, non-recursive locks that are also pre-emptible and protect critical sections. Simple locks are spin locks; they will wait – or spin – to acquire a lock either until they acquire that lock, or some threshold is crossed. In AIX, this threshold can be controlled using the SCHEDO tunable, maxspin.
Now let’s look at some lock data. SPLAT data is extracted from a raw lock trace file. But if you want to use the SPLAT, some prep work is needed. First, you need to tell the trace utility to examine locking activity, and only locking activity. At one time, lock tracing could only be enabled by doing a special form of bosboot and then rebooting the system. In fact, some purists still recommend this. But these days, all you really have to do is enable lock tracing from a root command prompt.
(A quick aside: You can get some lock data from a kernel trace... but only some. If you use SPLAT to extract locking activity from a regular kernel trace file, you'll wind up with only about 10 percent of what you need to diagnose locking problems with any degree of reliability.)
Here's the command to enable locking tracing. Again, do this as root:
Now run a trace. You can use this syntax, or any other trace syntax that you're comfortable with:
trace -a -fn -o locktrace.raw
-T20M -L40M ; sleep 5 ; trcstop
Really, the syntax doesn't matter – though I would recommend that you don't omit any hooks. Incidentally, during its usual course of execution, PerfPMR will run a locktrace on every logical processor in your system. PerfPMR will also handle turning lock tracing on and off, greatly simplifying lock activity data collection.
Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.