AIX > Administrator > Performance

Analyzing System Dumps

Mark J. Ray

I previously showed you how to read the contents of a core dump to determine the cause of a fault by an executable. An executable is any program―such as an application, database, middleware or utility―that runs in a UNIX system. Core dumps are simply indicators that either the environment an executable runs in or a piece of code contained within that executable has faulted. In these instances, a snapshot of the memory area servicing the executable is written to a file that can be examined using several tools.

In this article, we'll look at another type of dump: the system dump. In contrast to the largely benign core dumps, a system dump indicates a severe problem with an AIX system. While a core dump can be created and written to a file while an AIX system is up and running, system dumps usually halt the system altogether, necessitating a reboot. When a system dump occurs, AIX will attempt to capture the entire contents of memory and write that data to a file. As such, it’s imperative to have a dump device defined with enough space to contain the dump. (It’s possible to gain some insight diagnosing a partial dump, but you may miss critical data that could completely change the results of your dump analysis.)

I’ll assume your system is configured with adequate dump devices, and further, that it's set up to capture a complete dump. With that in mind, let’s begin.

The Savecore Command

As with most system diagnoses, a little prep work is required. Let's presume that your system dumped and is back up and running. (Of course, there are many reasons why it may not be back up. However, those issues are beyond the scope of this article.) Log into your system and run the error report; one of the entries will look like this:

67145A39 0413095315    U    S    SYSDUMP    SYSTEM DUMP

Here, you see the system dumped flagged; expanding this entry will tell you the time and date of the dump, among other things. To begin your diagnosis, copy the dump from the dump device to a file using the “savecore” command:

savecore  

Yes, the period is necessary. It indicates you want the dump copied to your current directory. As a general rule, make sure the free space in whatever file system you copy the dump to is roughly equal to double the size of your system's memory. You'll need the space for both the original compressed and uncompressed dumps. Using this form of the savecore command, you’ll get not only the dump itself, but a copy of the UNIX kernel that was running at the time of the dump. With the kernel file, if you find you don’t have enough space on your box for the uncompressed dump, you’ll be able to FTP it and the kernel file anywhere to do your analysis.

savecore will copy the dump to your current directory, and name it:

	vmcore.0.BZ

BZ stands for bzip, the dump utilities' preferred compression method.

Next, uncompress the dump using the dmpuncompress command:

dmpuncompress  vmcore.0.BZ

dmpuncompress will run the bzip utility and do the decompression. You're now left with a file called "vmcore.0". Dumps – as well as the kernel files -- are appended with a sequence of numerals indicating the order in which they were created; if dmpuncompress finds a dump already created with the .0 append, it will label successive dumps .1, .2, .3, and so on.

Lastly, format the dump:

/usr/lib/ras/dmprtns/dmpfmt  -c  vmcore.0

You should get a message that says the dump was completed properly and can be read:

" This dump appears complete - The end-of-dump component was found. "

Reading a Dump

Like using the dbx utility to read a core dump, system dumps have their own tool for this purpose: the kernel debugger tool (kdb). The kdb has many uses beyond reading dumps. I'll cover some of them in future articles, but let’s focus on the task at hand. Read the contents of your dump and its corresponding kernel file into the kdb with this command:

kdb  vmcore.0  vmunix.0

You'll now enter into the kdb context, with a screen that looks like this:

[lpar_name:/datafs] # kdb vmcore.0 vmunix.0
	./vmcore.0 mapped from @ a00000000000000 to @ a000000393998ee
	START END name>
	0000000000001000 00000000058A0000 start+000FD8
               … lines omitted …..	
	Dump analysis on CHRP_SMP_PCI POWER_PC POWER_6 machine with 
	2 available 	CPU(s) (64-bit registers)
	Processing symbol table...
	.......................done
	read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01E1380

Mark J. Ray has been working with AIX for 23 years, 18 of which have been spent in performance. His mission is to make the diagnosis and remediation of the most difficult and complex performance issues easy to understand and implement. Mark can be reached at mjray@optonline.net



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2017 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.

AIX > ADMINISTRATOR > PERFORMANCE

AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
AIX News Sign Up Today! Past News Letters