POWER > Systems Management > High Availability

GDR for Power Systems Offers Automated Recovery Management

Geographically Dispersed Resiliency


IBM defines IT resilience as the ability to rapidly adapt and respond to any internal or external disruption, demand or threat and continue business operations without significant impact. IT resilience includes both high availability (HA) and disaster recovery (DR). HA can be achieved with redundant IT infrastructure or technology to restart the failed components throughout any outages. DR requires a proven capability that can recover an organization’s business operations from either a planned or unplanned outage of an operating location.

Major market drivers behind the need for IT resilience are financial loss due to revenue loss, productivity loss and fines for breached service-level agreements, as well as the business impact of damage to market reputation and brand image caused by the disruption and downtime of critical business services. As the business environment becomes more competitive and government regulations more stringent, organizations need a comprehensive business continuity plan to help keep business operations running 24-7.

Business continuity is an integral part of any business operation. Business continuity plans include not only the necessary technology for recovering IT infrastructure, business applications and data, but also the people and processes. It’s crucial that all aspects of DR strategy are considered and implemented and that they’re tested and verified on a regular basis.

Commonly Used DR Technologies

Two commonly used technologies in HA and DR solutions are cluster-based technology and VM restart-based technology. Cluster-based technology relies on redundant standby nodes in the cluster ready to take over a workload when the primary node fails. This is an internally managed solution. Each node in the cluster will monitor the health of the partner nodes and the cluster environment. It offers a faster restart time; however, it’s more expensive due to redundant hardware and software requirements.

VM restart-based technology, on the other hand, offers more simplified DR operations. In this model, data for the entire VM is replicated to a backup site using storage replication methods. The replicated data will be used to start the VMs on the backup site in the event of a disaster. Once the OSes are booted, previously failed business services can restart and recover to resume normal operations. This model provides an out-of-band managed solution, which can scale to manage the entire data center and is better suited for the cloud environment. Figure 1, depicts the major differences between the above two models.

The GDR Solution for Power Systems

IBM Geographically Dispersed Resiliency (GDR) for IBM Power Systems*, first released in November 2016, is a DR solution based on storage replication and VM restart technologies. GDR processing is tightly integrated with the Power HMC and VIOS infrastructure. It provides easy-to-deploy, automated recovery management across the primary site and the backup site. Figure 2, provides a high-level componentry of the GDR solution. Features include:

  • Performs DR processing for hundreds of VMs based on PowerVM* virtualization
  • Support for IBM POWER7* and later servers, and AIX*, Linux* and IBM i OSes
  • Supports multivendor storage systems, including IBM System Storage DS8000*, IBM System Storage SAN Volume Controller, EMC Symmetrix Remote Data Facility-capable storage systems and Hitachi Universal Replicator
  • Single point of control from a control system (KSYS) to provide centralized status reporting and DR operations orchestration
  • Administrator-initiated planned and unplanned failover processing
  • Fully automated end-to-end recovery operations to produce reliable and consistent recovery time and reduce or eliminate manual intervention and human errors
  • Daily enabled or user-initiated verification checks across sites to ensure successful failover and facilitate regular testing for repeatable results
  • Email/text alerts to notify administrative personnel of critical events
  • Supports custom user plugin scripts for verification or event handling
  • Acquires and releases capacity on demand to activate resources required on the backup site to perform a takeover, such as On/Off Capacity on Demand management or enterprise pool exploitation
  • Can coexist with other Power Systems features and products including LPM, PowerVC* cloud manager and remote restart operations on the primary site

GDR Advanced Features

Moreover, GDR rolled out these advanced features, which enable a more flexible DR configuration and recover priority. One such feature is DR testing with a tertiary copy of data, which allows clients to perform a DR rehearsal without impacting production systems or replication. This would help clients satisfy some government regulations that mandate successful DR testing at a regular basis.

GDR also facilitates organizing hosts/servers into different host groups and performing failover at the host group level. This can facilitate rolling upgrades by moving one or more host groups to the backup site, performing the upgrade and then moving the host group(s) back when the upgrade is completed.

VMs can be tiered based on business requirements and started based on their specified priority level (high, medium or low). And flexible capacity DR management allows clients to deploy resources that fit their needs on the backup site. Clients can achieve a DR solution (or a DR test) with fewer resources, or use a lower level of servers with extra resources for DR failover. If clients have different versions of AIX on different disks, a boot device management function can be exploited to specify which boot disk to use when booting the OS on the backup site.

Extending VM Restart

To provide a uniform interface and enable the same recovery technology to manage both HA and DR, the VM restart-based technology will be extended to support local HA, as shown in Figure 3, VM restart-based technology will support end-to-end HA management through the capabilities of health monitoring at the system, the virtual server and the application level. If a failure is detected, based on policy, VM can be started on another server and business continues. This VM restart HA and DR solution will provide consistent setup, configuration and policy-based restart management for both HA and DR processing.


comments powered by Disqus

Advertisement

Advertisement

2018 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Data is Money

A recent survey explores the state of Power Systems resilience

POWER > SYSTEMS MANAGEMENT > HIGH AVAILABILITY

Determine the Right Level of Business Continuity

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store