Continuous Availability for AIX
IT administrators know all too well that protecting critical business applications, data and networks from failures is vital to the success of their business. Several technologies now provide high availability (HA) of applications and data. One such choice is clustering, which provides continuous availability for application and servers. Replication is another technology that mirrors data to a remote storage system so that if the primary storage system becomes unavailable, the data can be retrieved and accessed through the remote subsystem.
The most desirable solution is one that integrates these technologies to automate data replication, monitor application status and respond quickly to failures across systems. Therefore, if an application, server, storage device or network fails, all of the resources related to the application "fail over" (switch) to a remote site automatically and operations continue without any disruption to business. Such resilience is critical because downtime is costly in terms of revenue and customer perceptions. Automated failover solutions recover more quickly from failures, reduce the chance of service outages and decrease both administrator intervention and the associated chance of error.
IBM offers solutions that address the need for continuous availability for customers running their critical business applications on pSeries servers with the AIX OS. The solution integrates High Availability Cluster Multiprocessing (HACMP), the Metro Mirroring (peer-to-peer remote copy or synchronous remote mirroring--PPRC) technologies of IBM TotalStorage disk systems (Enterprise Storage Server or ESS, DS8000 and DS6000), and the TotalStorage SVC to provide a robust, fully automated continuous availability and disaster-recovery solution for business-critical applications. The solution provides fully automated failover and failback of applications and data without any customer intervention.
HACMP/XD (Extended Distance) is an optional feature of HACMP that provides several disaster-recovery options including support for storage-based Metro Mirroring. HACMP/XD, in combination with Metro Mirror (PPRC), provides HA and disaster recovery across geographically dispersed HACMP clusters, protecting business-critical applications and data against disasters that can affect an entire datacenter. IBM TotalStorage disk systems such as ESS, DS8000, DS6000 and SVC provide remote-mirroring technology to maintain separate identical local copies of application data on two separate storage subsystems.
HACMP provides rapid recovery of application services by automatically moving a workload running on a host server to a backup server after a failure. In a single-site HACMP environment, all cluster nodes sharing volume groups have physical connections to the same set of disks. In an HACMP/XD environment, the cluster nodes access the same shared volume groups, but the nodes at each site access the volume groups from different physical volumes (i.e., separate storage systems). When the application is active on a server at the primary site, all updates to the application data are automatically replicated to the disk system at a secondary site. When a failure occurs and the application is moved to the backup server at the secondary site, operations continue using the mirrored data on the secondary disk system. When the primary server returns to service, the direction of the data replication can be reversed so all data updates on the now-active backup disks are replicated to the disks at the primary site.
A typical customer environment running Metro Mirror and HACMP/XD would consist of a four-node, wide-area HA cluster consisting of two local servers (Server A and Server B) sharing the primary ESS at the primary site and two remote servers (Server C and Server D) sharing the secondary ESS storage at the recovery site that could be up to 300 km apart. Server A is the primary server running an application; Server B is the primary backup (at the primary site); and Servers C and D are configured as backup servers at the secondary site. The application data is stored on the shared primary ESS storage array. The shared application data is replicated through Metro Mirror over Fibre Channel links to the remote disk system at the secondary site. If cluster Server A fails and Server B remains healthy, HACMP will migrate the application(s) to Server B. All application data updates on Server B will continue to be replicated from the primary disk system to the secondary disk system. If Server B fails or if a local disaster disables both Servers A and B simultaneously, HACMP will automatically migrate the application(s) to Server C with minimal service interruption to the application users. HACMP/XD will automate the failover of the mirrored volumes between sites. HACMP/XD, combined with Metro Mirror, manages a clustered environment to automatically start the application at the remote site using the mirrored data. Server C will start accessing the mirrored data on the secondary storage system. As part of the recovery process on Server C, HACMP reverses the data replication from the secondary disk to the primary disk when this disk returns to service.