MAINFRAME > Administrator > Backup and Recovery

Disaster Recovery Levels


The rapid rise of e-business and the Internet has required some organizations to be available 24-7. In many cases, companies develop new applications in response to growing e-business requirements. As part of a move to e-business, however, enterprises should evaluate recovery requirements of various applications and weigh the options available to them through todays technology.

Recovery Time and Recovery Point Objectives

A company developing a power-outage disaster plan must weigh the need to recover quickly and completely against the cost to implement the recovery. The impact to the I/O performance of the installations primary business application(s) should be considered, as well as the installations recovery time objective (RTO). In other words, how much time is available to fully recover the applications with all critical operations up and running again? Another important factor to consider is the installations recovery point objective (RPO): How much data is lost, or at what actual recovery point-in-time (PiT) is all data current?

Determining the RTO and RPO involves examining and comparing the following:

  • The cost of some data loss while maintaining cross-volume/cross-subsystem data consistency. Maintaining data consistency allows a database restart (typically seconds to minutes in duration).
  • The cost of no data loss, which will either impact production on all operational errors in addition to disaster-recovery failure situations or yield a database recovery disaster (typically hours to days in duration), as cross-volume/cross-subsystem data consistency isn't maintained during the failing period.

Cross-Volume Data Integrity and Consistency Groups

It's crucial that computers write data to disks with full integrity, even in the event of hardware failures and power failures. To accomplish this, system designers employ many techniques, such as:

  • Mirrored storage subsystem cache to prevent data loss in the event of a cache hardware failure
  • Battery backup to prevent cache data loss in the event of a power failure
  • Mirrored disk or parity-based RAID schemes for protecting against hard-disk drive failures

Another, more subtle, requirement for preserving the integrity of data being written is making sure that "dependent writes" are executed in the applications intended sequence. Note that many years ago application developers developed various dependent write sequences to preserve data integrity/data consistency for data being written to disk across power failures. Consider this typical sequence of writes for a database update transaction:

  1. Execute a write to update the database log, indicating that a database update is about to take place.
  2. Execute a second write to update the database.
  3. Execute a third write to update the database log, indicating that the database update has completed successfully.

It's imperative that these "dependent writes" are written to remote mirrored disk in the same sequence in which the application issued them. In the previous example, theres no guarantee that the database log and the database reside on the same storage subsystem. Failure to execute the write sequence correctly may result in writes (1) and (3) being executed, followed immediately by a system failure. When it's time to recover the database, the database log would incorrectly indicate that the transaction completed successfully. The transaction would be lost, and the integrity of the database would be questionable.

As pressure for availability mounts, effective disaster-recovery techniques become vital to businesses.

Robert Kern works in Disk Storage Architecture for the IBM. Robert can be reached at bobkern@us.ibm.com.

Victor Peltz works in Business Line Management for the IBM Systems and Technology Group in San Jose, Calif. Victor can be reached at vpeltz@us.ibm.com.


comments powered by Disqus

Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

MAINFRAME > ADMINISTRATOR > BACKUP AND RECOVERY

Active/Active Sites Helps Distribute Workloads During Outages

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters