Skip to main content

Save Money and Better Serve Customers With a Comprehensive DR Strategy

John Dominic of Maxava explains the importance of taking steps to minimize downtime.

It seems like every week we read about another system outage or computer failure that leads to major chaos, customer frustration and disruption. Serious outages at airlines, banks and stock exchanges always make great stories for the media because downtime disasters that affect the average person (e.g., travel and money) are the most newsworthy. 

But as IBM i IT professionals, we should always consider that for every high-profile example we read about, many hundreds of outages go completely unreported. The average organization scrambles to get systems back online in an outage with as little disruption as possible, and unfortunately, they typically take a massive hit on cost from the downtime event. 

The issue is that the cost of downtime is rising fast, even for the smallest businesses. Companies tend to lose a large amount of data during a disaster (every keystroke made since the last backup is lost) and the cost of insuring that loss is also expensive. And according to a recent analysis by Forbes, organizations create an average of 2.5 quintillion bytes of data per day, further compounding the problem. The more data, the greater the loss—and the ultimate cost of recovery. 

A recent study conducted by the Ponemon Institute put the average cost of downtime at $9,000 per minute. Given that 50% of businesses reported suffering an unplanned outage of two or more hours over the past five years—that adds up to a massive amount. 

Greater Disasters; Growing Costs

A bigger challenge is the fact that this pattern is unlikely to change any time soon. If anything, it is likely to escalate. Several studies have been performed on the impact of natural disasters using data collected from the Federal Emergency Management Agency, the National Oceanic and Atmospheric Administration and the Small Business Administration. Over the last 10 years, three notable trends emerged: 

  1. Disasters are occurring more frequently
  2. The damage per storm is rising 
  3. The cost for recovery is growing dramatically

In the IBM i world, we like to promote the superior availability statistics of our platform, but we aren’t immune to these events—whether they are unplanned or not. We need to accept that outages will happen. Even the largest organizations with the biggest budgets and the most critical infrastructure cannot avoid them. Every company needs to respond. 

Approaching the Challenge

Understanding the causes of downtime is a great first step to putting a robust disaster recovery (DR) strategy in place. According to the aforementioned Ponemon Institute study, the major causes of outages in 2016 were:

  • UPS system failure (25%)
  • Cyberattack (DDoS) (22%)
  • Accidents/human error (22%)
  • Water, heat or CRAC failure (11%)
  • Weather-related (10%)
  • Generator failure (6%)
  • IT equipment failure (4%)

Geographic separation of a synchronized primary and secondary database goes a long way to avoiding your own downtime disaster. For example, during Hurricane Sandy in 2012, a state of emergency forced several companies in New Jersey to close. In some cases, power remained on in the building, but employees were denied access to the premises. Companies that included regional separation with synchronized replication as part of the disaster-planning were spared—they simply role-swapped their production workload to live systems in the Midwest in a matter of minutes. Those that didn’t account for separation didn’t fare so well. 

This is where an appreciation for the scope of a true disaster proves invaluable. I’ve found that companies that base their disaster-planning strategy on a limited factor (such as a hot storage deal) learn the ugly truth when an outage occurs. And an event doesn’t have to be in the form of a natural disaster, either—any scenario mentioned here can lead to the same problems. Something as simple as a minor traffic accident can cut off power to a building for hours. Again, regional separation works wonders. 

Another complication that companies learn from an outage is that lost data is difficult (if not impossible) to recreate. When the IBM i was an insulated system, this wasn’t an issue. But with EDI changes coming from multiple sources on a constant basis, recreating the last however many hours of transactions is a daunting task for even the biggest shops. If anything, many organizations now look to recovery point improvements as the primary driver for updating their disaster recovery plan.

Recognizing the Benefits

Organizations that adopt strong disaster-recovery plans to mitigate regional outages benefit in several ways. The most obvious is that they can continue operations elsewhere during a localized outage. Having remote systems and staff available in the event of a disaster is key to achieving that level of constant availability.

Clients also benefit from improved customer service. Maintaining active IT systems means that customers can be serviced 24-7-365. Systems can also be swapped during planned outages (e.g., maintenance windows) so productivity isn’t impacted at any time. Business transactions can occur on all shifts across a multinational client base—and by providing full-time access to data systems, customers are never incentivized to look elsewhere for alternatives.    

Organizations also indirectly benefit through compliance. Many companies must provide a proven recovery plan to comply with insurance coverage or to maintain the lowest premiums. The key point is ‘proven’ plan. The days of purchasing products purely for the sake of ticking a box for compliance auditors, as was popular during the initial years following the introduction of Sarbanes Oxley regulations, have since passed. Companies, especially those linked with financial and insurance institutions, must now actively deliver on this availability requirement to maintain a positive rating.

The Bottom Line

Outages are always painful, and the cost of damages are only going up. Unproductive employees and lost data typically cost more than companies estimate, but the financial impact stemming from brand damage and lost customers is almost immeasurable. Disaster recovery solutions that geographically separate data across synchronized IBM Power Systems servers are typically less expensive than most people think, especially when compared to the rapidly rising cost of downtime.

Webinars

Stay on top of all things tech!
View upcoming & on-demand webinars →