AIX > Tips & Techniques > Systems Management

New Features in IBM Simplified Remote Restart

SRR Automation Tool
 

The IBM POWER8 servers have a feature called Simplified Remote Restart (SRR). This feature allows an admin to restart the partitions on a failed server to another server. While this feature has been available since Dec 2014 on POWER8 servers, it has been enhanced and the latest HMC Release V8R8.5 includes features geared toward the most demanding customers.

Let’s give a high-level overview of SRR, why you should use it, and then discuss the features available in the latest HMC release.

Overview

When a server crashes, the partitions on that server also crash and you have to wait for the IBM service representative to come onsite and repair the server. Those partitions are down until the server is repaired. With SRR, those partitions can be “moved” to another server and restarted on a different server. You can think of this similar to Live Partition Mobility (LPM), but LPM requires the server to be up and running before you can move any partitions. Now you can have an unplanned outage and do LPM-like operations to move the partitions and restart them and resume running the workloads of that partition.

Making My Case

I have talked to many customers about SRR, and I get one of three reactions. The first one is, “We need this feature and we’ll enable it.” I like this reaction. But I sometimes get the following reaction: “We have our important partitions clustered and can just failover to another partition, so we don’t need SRR.” This response is OK, but sometimes cluster failover isn’t tested frequently and this ignores the fact that customers have many non-production partitions; and if those partitions are down, developers and testers are idle until the server is fixed. The third reaction is, “We don’t have enough resources (other servers) to restart partitions from a failed server to another server.” I do know that customers like to run lean and not have overcapacity in their POWER8 server farms. But if a server crashes, they may only want to restart a few very important partitions. In this case, I argue that a customer would make resources available for the remote restart of these important partitions by shutting down less important partitions and freeing up resources. Whatever your reaction is, enabling SRR actually doesn’t cost you anything, and why not have this feature enabled in case of a server outage?

SSR Features

With the new HMC V8R8.5 release, there are new features of SRR that make this function easier to use in many IT environments. Early in my discussions with customers, there has been a need for customers using NPIV technology to make sure a remote restarted partition is placed on the correct VIOS pair and is mapped to the correct FCS port. Before HMC V8R8.5, the user had no control of the NPIV mapping. Now this mapping can be specified.

A second important feature in HMC V8R8.5 is that the server can crash completely. There’s no need for a connection from the HMC to the Service Process (aka FSP). The HMC can restart the partitions even if the server has lost all power to the frame. Prior to this HMC release, the SP has to still have power and the HMC needed to be able to connect to it.

The third new feature I want to discuss is the ability to enable SRR during LPM operations. Prior to this HMC release, the partition had to be shutdown, then an admin issued an HMC command from the HMC command line to enable SRR, and then the customer activated the partition again. In other words, you couldn’t dynamically enable SRR on a partition. Now you can LPM the partition and have SRR enabled as part of the LPM operation. So, as you move a partition from POWER7 to POWER8 with LPM, you can enable SRR during that migration.

Learn More

You can find more information on SRR below. I also have a tool you can use to do SRR operations named “PowerVM LPM/SRR Automation Tool”

See the list below for videos detailing everything you ever wanted to know about SRR and our tools.

Short Videos

ibm.biz/LPM_overview – demo of LPM/SRR tool (10 minutes May 2016)

ibm.biz/LPM_scheduler – demo of LPM/SRR tool scheduling a group of LPMs (4 minutes May 2016)

ibm.biz/LPM_PEP – demo of LPM/SRR tool automating Power Enterprise Pool resources moves as part of LPM operations (5 minutes May 2016)

ibm.biz/SRR_benefits – video presentation on what SRR is and why you need to use it (12 minutes May 2016)

ibm.biz/SRR_tool – demo of LPM/SRR tool performing SRR operations and cleanup of a failed server (8 minutes May 2016)

ibm.biz/SRR_enterprise_tool – video presentation on why you need the LPM/SRR tool to do enterprise-level SRR operations (12 minutes long Aug 2016)

ibm.biz/SRR_bikeride – enjoyable video/demo on how quick SRR can recover a failed server using the LPM/SRR Automation tool (5 minutes long May 2016)

Longer More Detailed Videos

https://www.youtube.com/watch?v=YdC7UuJr6s4- detailed video presentation on LPM/SRR tool including new features in V8.5 of the tool (1:23 long Aug 2016)

Questions?

I hope this helps you! And if you run into questions, don’t hesitate to reach out to me at bobf@us.ibm.com or contact IBM Systems Lab Services at ibmsls@us.ibm.com.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2017 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
AIX News Sign Up Today! Past News Letters