AIX > Administrator > Systems Management

How to Diagnose RMC Connection Issues From the HMC


 

Based on the output on the previous page, there are no issues between the HMC and the target partition. This is good.

The diagrmc tool will call out any general network communication issues between the HMC and a partition. In the example below, the HMC is unable to communicate with the partition, aixlpar1, IP address, 10.1.1.12. The issue was tracked down to a firewall that was blocking all communication between the LPAR and the HMC.

hscroot@hmc1:~> diagrmc -m Server-8286-42A-SN214F55V -p 
aixlpar1 --ip 10.1.1.12 --autocorrect
** Checking management console network setup, NIST mode, 
firewall, RMCkey, etc.:
 
** Checking RMC subsystem:
 
** Checking duplicate IPs:
        -- No duplicate IP address found.
 
** Checking duplicate NodeIDs:
        -- No duplicate NodeID found.
 
** Checking ID, network, etc. for partition 2* Server-8286-42A-SN214F55V 
10.1.1.12:
        >> WARN: -- The partition has no RMC connection. 
If this is a Linux partition, verify if it has rsct rpm installed.
        
** Testing network connectivity.
        >> WARN: The management console might not be able to establish an 
RMC connection to the partition due to a network issue. 
        Use ssh to verify the network and 
authenticate through the firewall
         from the management console to partition 10.1.1.12 
and from partition 10.1.1.12 to the management console.

In the next example problem, the HMC has found that there’s an IP address that it believes is in use more than once (a duplicate) on the managed system. It states that the IP address, 10.2.55.225, is in use by partition 101 and 102.

hscroot@hmc1:~> diagrmc -m Server-8286-42A-SN214F55V 
--id 101 --ip 10.2.55.225 
...
** Checking duplicate IPs:

>> WARN: The IP address 10.2.55.225 is being used 
by both 101*Server-8286-42A-SN214F55V 
and 102*Server-8286-42A-SN214F55V and this would
cause no connection if both are activated. 
To correct this, reconfigure an active node 
with a different IP address, 
then run diagrmc with the --autocorrect option.
lssyscfg -r lpar -m CEC_name -F rmc_ipaddr,lpar_id,name,state,
rmc_state | sort 

When you scan the list, you can identify the duplicate addresses as consecutive entries with the same first parameter (RMC IP address).

We resolved this problem by changing the IP address of the LPAR and then re-running diagrmc with the –autocorrect option. This corrected the error and all was well again.

...
** Checking duplicate IPs:
	-- No duplicate IP address found.
...	
** Checking ID, network, etc. for partition 101*8286-42A*214F55V 
10.2.55.225:
...
** Testing network connectivity.
	-- Lpar 101*8286-42A*214F55V/10.2.55.225 
has RMC connection.

You can find failed diagrmc commands with the lssvcevents command on the HMC. Looking through the list of HMC console events, you’ll find the diagrmc command that was executed.

hscroot@hmc1:~> lssvcevents -t console -d 10 | grep diagrmc
time=11/09/2016 01:07:13,text=HSCE2124 User name hscroot: 
diagrmc --autocorrect -v command failed.
time=11/09/2016 00:57:22,text=HSCE2124 User name hscroot: 
diagrmc -m Server-8286-42A-SN214F55V --id 101 --ip 
10.2.55.225 command failed.

So next time you encounter a problem with DLPAR in your environment, please remember this tool. It may just save you a whole lot of time and effort when diagnosing an RMC problem.

Before I finish up, I'd like to remind everyone that the official IBM guide to resolving RMC errors can be found at the following link:

Fixing the No RMC Connection Error

http://www-01.ibm.com/support/docview.wss?uid=isg3T1020611

And, if you do need to run the commands listed in this official tech note, please be careful, particularly if you have any clusters in your environment; PowerHA for example. The tech note provides the following words of warning:

"CAUTION: Running the recfgct command on a node in a RSCT peer domain or in a Cluster Aware AIX (CAA) environment should NOT be done before taking other precautions first. This note is not designed to cover all CAA or other RSCT cluster considerations so if you have an application that is RSCT aware such as PowerHA, VIOS Storage Pools and several others do not proceed until you have contacted support. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Troubleshooting the resource monitoring and control (RMC) subsystem.”

For your future reference, here's a list of my "go to" resources for DLPAR and RMC issues. The tech note "Checking the status of RMC connections on the HMC" is a particular favorite of mine.  

Checking the Status of RMC Connections on the HMC

Fixing the No RMC Connection Error

Troubleshooting the resource monitoring and control (RMC) subsystem

RMC usage on HMC

Chris Gibson is an AIX and PowerVM specialist located in Melbourne, Australia. He is an IBM Champion for Power Systems, IBM CATE (Power Systems and AIX), and a co-author of several IBM Redbooks publications.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

AIX > ADMINISTRATOR > SYSTEMS MANAGEMENT

How to Download Fixes

ADMINSTRATOR > SYSTEMS MANAGEMENT

Understand your options for 12X PCIe I/O drawers

clmgr: A Technical Reference

PowerHA SystemMirror 7.1 introduces a robust CLI utility

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters