AIX > Administrator > Backup and Recovery

Insufficient Evidence

What to do after a system failure when no root cause is found

What to do after a system failure when no root cause is found

In the middle of an IT disaster, it’s all hands on deck. Quick thinking—assisted by a healthy rush of adrenalin—can get you through the crisis. Once the system is up again, hopefully you’ll have time to conduct a post-mortem investigation to uncover the root cause and prevent future problems.

When disaster strikes, our natural inclination is to identify a culprit to blame: the system, the vendor, the network, the storage-area network (SAN), the DBAs, etc. In many cases, however, you might uncover a few inconsistencies, but the real cause can’t be identified: “Case dismissed due to insufficient evidence.” If the systems are your responsibility, the words “no root cause found” hang over you like a dark cloud.

No Smoking Gun

But without a root cause to focus upon, how do you prevent a reoccurrence of “The Great IT Disaster?” Even if you don’t know exactly what went wrong, you can take steps to stop those gremlins from returning. Here are a few ideas:

Keep a level head. The more objective you can be in looking at the symptoms and the possible factors that brought the system down, the better chance you have of unraveling the mystery and preventing “Disaster Part II.” Level heads don’t roll.

Get a second opinion. You might not want to hear that advice after being swamped with opinions on what went wrong and whose fault it is. But going to someone who isn’t a stakeholder can provide new insights. Check forums and blogs, log support calls, chat with someone who has seen this—or something even worse—before. Combine others’ experience with local knowledge.

Apply change management. After a disaster, ask what processes might have changed. Maybe the change was perfectly legitimate, but did it really have to be done 30 minutes before the payroll run? Proper change-management procedures can stop people from working in silos, so they can get a better feel for the business and user needs. This ensures you have a back-out plan and helps isolate configuration changes into bite-sized elements. It’ll also give you more confidence knowing what works and what doesn’t (as well as what to do about it).

Anthony English is an AIX specialist based in Sydney, Australia.

comments powered by Disqus



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.


Backup and Restore With AIX

While there are some strong third-party tools available to you for backup and restore purposes on the AIX OS, don't be afraid of the standard AIX toolset to archive your data.

Business Flexibility and Agility through Application-Driven Data Management

The information that requires the highest priority for backing up—and the greatest precedence for recovery—should be the data most needed by your mission-critical applications.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters