AIX > Administrator > Systems Management

A Step-By-Step Guide to Live Partition Mobility

A Step-By-Step Guide to Live Partition Mobility
In the past couple of months I have been using LPM (live partition mobility) extensively. LPM has been available from IBM for multiple years now and has been used for a number of reasons. Some examples include server consolidation, workload balancing, evacuating servers for planned maintenance, migrating from older technology (POWER6 and above) to newer technology.
 
LPM is not a replacement for disaster recovery or high availability solutions. It’s designed to move a running LPAR or a properly shutdown LPAR to a different server. It cannot move an LPAR with a crashed kernel or from a failed machine. If all the prerequisites are met, LPM can be used to move an AIX, IBM i or Linux LPAR from one LPM-capable POWER server to another compatible server. By compatible I mean that it has to meet the requirements for the Power Systems server, the management console, the VIOS (PowerVM) and the LPAR itself.
 
LPM can be used with servers managed by the HMC, IVM or an FSM. For the purposes of this article we will discuss HMC managed systems. LPM can also be used between servers that are managed by different HMCs as long as the prerequisites are met.
 
Terminology
LPM itself is enabled by installing PowerVM Enterprise Edition. It’s not available with PowerVM Standard Edition, however, you can upgrade from Standard to Enterprise. There are some specific terms within LPM that it is important to understand. 
 
Active Partition Mobility
Active Partition Migration is the actual movement of a running LPAR from one physical machine to another without disrupting the operation of the OS and applications running in that LPAR.
 
Inactive Partition Mobility
Inactive Partition Migration transfers a partition that is logically ‘powered off’ (not running) from one system to another.
 
Suspended Partition Mobility
Suspended Partition Migration transfers a partition that is suspended from one system to another.
 
Partition Mobility (Live or Inactive) and Partition Migration (Active or Inactive) refer to the same feature which is the movement of an LPAR (partition) to a different server.
 
VASI (virtual asynchronous services interface)
The VASI provides the communication path between the VIO server and the Hypervisor. The VASI is enabled on a VIO server when the MSP is enabled. The source and target MSPs use a VASI device to gain access to the mobile LPARs state during an active migration. You can check the state of your VASI on the VIO LPAR by using the following command as padmin:
 
lsdev -virtual | grep vasi

 
You can also use the vasistat command to monitor the VASI device.
 
RMC (resource monitoring and control)
One critical prerequisite for LPM to work is that RMC is working as LPM uses RMC for the migrations. RMC allows the management console to communicate with the LPARs to perform operations such as shutdown, DLPAR, service event reporting, and virtual device management. It operates over the network between the management console(s), the VIOS partitions, the MSP partitions, and the mobile LPAR partition. RMC connections can take 5-7 minutes to re-establish so wait for a while after a reboot before checking them. 
 
Designing for LPM
LPM has some significant pre-requisites that must be met so these require planning and may involve some changes to your setup ahead of time. Different HMC and PowerVM levels provide access to different features so these need to be researched ahead of time. Below is a list of the basic prerequisites—this is not an exhaustive list and IBM documentation should be checked.
 
o   The LPAR to be migrated cannot be a VIO server
o   A pair of supported servers (POWER6 or higher)
o   A supported AIX, Linux or IBM i version
o   PowerVM Enterprise, preferably higher than 2.2.4 (really should be at the latest version)
o   Time of Day clocks for VIO servers should be synchronized
o   The OS and applications must be migration-aware or migration-enabled
o   The server must show as both Active and Inactive partition mobility capable when checked under server properties, capabilities.
o   HMC must be at a supported level—highly recommend at least 8.8.6 sp3
  • Remote migration requires at least v7.3.4 on the HMC but it should be higher
o   All I/O for the LPAR must be virtualized at the time of the move
  • This means using NPIV, or vSCSI for storage and virtualized ethernet for the network
  • External iSCSI can be used if shared as vSCSI
  • Dedicated IO adapters must be de-allocated before migration
  • The DVD in the VIO may not be attached to mobile LPAR as virtual optical device
  • If using FBO (file backed optical) no virtual optical device can be attached to the mobile LPAR
o   All LPARs must be on the same Open network with RMC working with the HMC
o   The source and target VIO servers must have access to the same network (VLAN and subnet) with a SEA (shared ethernet adapter)
o   LPARs must be under the control of one or two VIOS servers
o   Storage must be zoned to both the source and the target
  • For NPIV this means that both of the WWPNs for each virtual fibre adapter must be zoned
  • For vSCSI there can be no LVM based disks and no internal disks used
  • Storage also needs to be mapped for NPIV to all of the WWPNs for NPIV
  • Hdisks must be external and the LPAR (or VIO If vSCSI) must have reserve_policy=no_reserve
  • SSPs (shared storage pools) can be used as long as the source and destination are part of the same VIOS storage group.
  • For NPIV physical adapter max_xfer_size should be the same or greater at the target
o   The LPARs must be using the SEA (shared ethernet adapter)
o   No LPAR at the target server can have the same name as the LPAR being moved
o   No LPAR at the target can have the same network virtual MAC address
o   The target must have enough cores, memory and virtual adapter slots free
o   The source should have enough virtual adapter slots free in case they conflict at the target and have to be changed
o   The LMB (logical memory block) or memory region size must be the same on the source and target servers—changing this requires that the server be powered off and on
o   If you are using AMS (active memory sharing), AME (active memory expansion), suspend/resume or trusted boot for the mobile LPAR then check that it is set at both ends
  • For shared memory LPARs (using AMS) the destination must have a paging device available
o   Virtual adapters cannot be marked as required and shouldn’t be marked for “any client”
o   Note: Best practice is to no longer use required for adapter as it affects the ability to use hot-plug technology. It also will cause LPM validation to fail.
o   For active migration: Mobile LPARs cannot use huge memory pages, BSR arrays or redundant error reporting
o   No consoles can be open (vterm) to the LPAR—this will give a warning but will not cause the LPM to fail
o   LPAR to be migrated cannot be the service partition
o   The processor mode must be compatible—you can’t migrate a POWER8 mode LPAR to a POWER7
o   If using shared processors ensure the entitlements are compatible between servers
o   If the mobile partition is suspend-resume capable, make sure the target has a reserved storage pool greater than or equal to 110 percent of the lpar size
o   Migrating an IBM i LPAR
  • Verify the destination server supports the migration of IBM i mobile partitions and the restricted I/O mode
  • Verify the IBM i mobile partition is in the restricted I/O mode
  • Changing to restricted I/O mode requires a reactivation of the IBM i LPAR and will also affect your ability to attach physical devices
Dual VIOS Requirements
Migration for an LPAR supported by a single VIOS to a server that is also single VIOS is very straightforward. When the LPAR is supported by a dual VIOS configuration there are some additional considerations. The LPAR will attempt to have the exact same configuration on the destination server, however there are options to migrate to single or dual VIOS configurations. But if you are using MPIO across dual VIOS LPARs then the target system must also be dual VIOS. There are workarounds that are beyond the scope of this article.
 
 
Migration Phases
When you are performing a migration there are multiple phases that the process goes through. In all cases it starts with a validation process. For active migration the phases include the following:
            Validate configuration
            Create new LPAR on target server
            Create new virtual resources on target server
            Migrate the state of the LPAR in memory
            Remove the old LPAR configuration from the source server
            Free up the old resource on the source server
 
Memory state above includes the partition’s memory, the hardware page table (HPT), the processor state, NVRAM (nonvolatile RAM), time of day (ToD) and the partition configuration.
 
For an inactive migration the phases include:
            Validate configuration
            Create new LPAR on target server
            Create new virtual resources on target server
            Remove the old LPAR configuration from the source server
            Free up the old resource on the source server
 
An inactive migration can be used to migrate LPARs that use BSR or huge pages. However, an inactive migration must have been activated at least once, even if only into SMS mode. This is because the migration uses the last activated or last running partition profile for the migration. This will not exit if the LPAR has never been migrated.
 
Validation checks for capabilities and compatibility, that RMC is working, that the LPAR meets requirements, that the target has enough resources and that the virtual adapters can map successfully. It checks there are no required adapters and that no virtual serial slots are required above slot 2. There are also additional checks that occur, but these are the basic ones.
 
Setting Up for Remote Migration
Remote migration is the ability to use LPM between 2 servers on different HMCs. Below are some of the prerequisites for performing remote migrations.
 
·      A local HMC managing the source server
·      A remote HMC managing the target server
·      Functional RMC daemons
·      Version 7.3.4 or later of the HMC software (you really want to be at 886sp3 or higher) on both HMCs
·      Network access to the remote HMC
·      SSH key authentication to the remote HMC and all involved LPARs (VIOS and actual LPAR)
·      Plus, all the other requirements for single HMC migration
 
How Do You Perform an LPM?
Once you have confirmed that all the prerequisites have been met you can start the LPM process. I always login to every LPAR and check the error log and that there are no full filesystems prior to migrating. I also double check that reserve_policy=no_reserve on all disks. I also check the LPAR profile to make sure there are no required adapters.
 
From the GUI I select mobility and then validate for the LPAR. When doing remote migration you will need to put in the IP or name of the remote HMC and the account used for the SSH key pair—I normally create one called lpmacct.  Then you refresh the server options. I usually check the two boxes labelled “Override virtual network errors when possible” and “override virtual storage errors when possible” and then rerun validate. If the option for MSP Pairing becomes available make sure the correct VIO servers have been chosen as the source and target. I was working on two servers, each with multiple VIO servers and not once did it select the correct pairing.
 
Once validate is run you should go down through the virtual storage assignments at the bottom and ensure that the virtual adapters are assigned to the correct VIO servers at the target. LPM may not choose these correctly if you have more than one VIOS at each end. I find running an HMCScanner report ahead of time helps me to know how these should be done. Once they are all selected correctly you run validate again. I then click on “view vlan settings” and check that allt he VLANs are there and then I click on migrate.
 
Migration time varies depending on the amount of memory and activity on the LPAR as well as network bandwidth. Inactive LPARs move in seconds, active ones take more time.  A 200GB LPAR that is fairly active took me 35 minutes to move. During that time the LPAR was up the whole time. At the very end of the migration there is a minor network outage (less than a second usually) because the LPAR gets suspended as the last few memory pages get copied over. This is barely noticeable. Once the migration is complete the LPAR should be checked for functionality and for any error messages.
 
And what can go wrong?
The main issues I have seen around LPM have to do with disk, network and LPAR settings. If an AIX LPAR is migrated and the disks are not set to no_reserve or if the zoning and mapping of the LPM WWPNs is not done all the way through to the disk, then the LPAR will migrate but it will set the disks to readonly and RMC will not work.  If the LPAR gets shutdown (shutdown immediate since you have no RMC) then it is highly likely the boot image will be corrupted. I’ve seen this several times. As of VIO 2.2.4, there is a new pseudo device called vioslpm0 with some new attributes to help with checking for disks.  If you are at 2.2.4 or higher on both VIO servers then you can set the src_lun_val and dst_lun_val so that it does a complete end to end check to ensure it can see all the disks. Without that LPM only checks out to the switches. However, if there are a lot of disks the validation will take a very long time.
 
For checking WWPNs the HMC Scanner is very useful. The “Virtual_SCSI” tab includes all the vSCSI mappings and the “Virtual_Fibre” tab includes all the NPIV mappings. The column labelled WWPN#1 is the WWPN normally used when the LPAR is running. The column labelled WWPN#2 is the one used by LPM – it is often not zoned or mapped because it does not show up until it is logged into (during an LPM). However, it must be zoned and mapped to avoid the issues discussed above.
 
From the network perspective, the network setup needs to be checked. If RMC is unable to function then you will lose control of the LPARs from the HMC.
 

Summary

LPM has been around a long time and is used in many sites to move LPARs around and to migrate between servers or to servers on remote HMCs. This article attempts to define some of the things you need to think about before using LPM. It requires meticulous planning but is incredibly useful once it is setup.  The lists and comments above should be a good start but may not include everything you need to know. The resources listed in the references below will always be the most up to date for LPM information. I highly recommend ensuring new systems coming in are configured so they will work with LPM, even if you don’t plan to use it today. Some of the changes require powering off the server and it is much easier to do that at installation time.
                        
 
References
For more information on LPM, check out the following reading materials:
 
POWER7 LPM firmware support matrix
https://www.ibm.com/support/knowledgecenter/POWER7/p7hc3/p7hc3firmwaresupportmatrix.htm
 
vioslpm0 new pseudo device
https://www.ibm.com/support/knowledgecenter/TI0002C/p8hc3/p8hc3_vioslpmpseudo.htm
and http://www-01.ibm.com/support/docview.wss?uid=isg3T1022733
 
NPIV LUN Level Validation
https://www.ibm.com/support/knowledgecenter/POWER8/p8hc3/p8hc3_npivorlunval.htm
 
LPM Validation Options
https://www.ibm.com/developerworks/community/wikis/home?lang=en_us#!/wiki/Power%20Systems/page/NPIV%20storage%20validation%20options%20for%20Live%20Partition%20Mobility
 
Where to find LPM documentation, best practices, etc
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Where%20do%20I%20find%20LPM%20Documentation,%20best%20practices,%20and%20information%20on%20error%20codes
 
LPM Setup Checklist
http://www.redbooks.ibm.com/abstracts/tips1184.html?Open
 
Other LPM Redbooks
PowerVM Virtualization Introduction and Configuration
http://www.redbooks.ibm.com/abstracts/sg247940.html?Open
 
PowerVM Virtualization Managing and Monitoring
http://www.redbooks.ibm.com/abstracts/sg247590.html?Open
 
Nigel Griffiths AIXPert Blog
https://www.ibm.com/developerworks/community/blogs/aixpert?lang=en

IBM Virtual User Group
https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/Power%20Systems/page/IBM%20Power%20Systems%20technical%20webinar%20series%20(including%20Power%20Systems%20Virtualization%20-%20PowerVM)
 
Fix Central
http://www-933.ibm.com/support/fixcentral/
 
FLRT (Fix Level reporting tool)
http://www14.software.ibm.com/webapp/set2/flrt/home#reports
 
HMCScanner (latest is 0.11.35)
https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/Power+Systems/page/HMC+Scanner
 
 

Jaqui Lynch is an independent consultant, focusing on enterprise architecture, performance and delivery on Power Systems with AIX and Linux.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2018 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

AIX > ADMINISTRATOR > SYSTEMS MANAGEMENT

How to Download Fixes

ADMINSTRATOR > SYSTEMS MANAGEMENT

Understand your options for 12X PCIe I/O drawers

clmgr: A Technical Reference

PowerHA SystemMirror 7.1 introduces a robust CLI utility

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters
not mf or hp