November 15, 2016
As I've mentioned
, the AIX mailing list
is a great place to go to pose questions and receive good answers from other AIX pros. While traffic is typically pretty light, I recently came across an interesting thread about the need to take care when editing critical files:
A coworker was editing the /etc/passwd file on an LPAR on our P720 server. When he tried to save the file, emacs hiccupped and he ended up putting an empty /etc/passwd file in its place.
Now, with no open sessions to the LPAR, no one can access the LPAR. This LPAR is one of our primary NFS servers. So far, only a few items have stopped working, SAMBA being one. But in general, the hundreds of AIX/Linux/Unix clients in our R&D group are still able to reach the NFS mounts (at least the ones they had automounted when this all happened).
I went to the HMC and got a terminal/console there, but still need a password to get in.
Any ideas as to what I might do to crack this nut and get into the box?
Put on your thinking cap for a moment. How would you get out of this pickle?
The first two replies offer great suggestions:
What is your backup product? You may be able to restore it using the agent already running on the system.
New logins will be impossible. You'll have to leverage something that already has access.
The second simply consists of a link to this IBM Knowledge Center doc
, plus the following:
echo 'root:!:0:0::/:/usr/bin/ksh' > /etc/passwd
chmod 644 /etc/passwd
The next day, the solution was posted:
Thank you for all your suggestions. It brought back to heart why I so loved AIX and the support I can get (and occasionally give).
Here is the solution:
We had a backup of the system, but the tape was offsite at our DR site (Some cave under Lake Erie, or the like).
Then it hit me, I do not need "the" /etc/passwd file from this LPAR (last backed up in a Full backup in August!). I just need "a" /etc/passwd file. ANY /etc/passwd file. Or just a one line passwd file I could make myself.
I just needed the back-door of NetBackup to place the file there.
By now I had the NetBackup guys on the line and in Priority 1 mode, so I asked them to pull the /etc/passwd file off of the twin LPAR on the other P720 we have. Then restore it to this LPAR. Less than 2 minutes, we were back in business!! Then I had a copy of the real passwd file, which is nearly identical to the one from the other LPAR, and I put that in place.
I'll also cite the reply to that, because it's an awesome punch line:
Having a close call is a good time to review your backups and your bootable media.
If you don't have NIM then it's critical to keep media at close level to what you are running available near the machine.
The rest of the discussion covers things like making sure your NIM server ready to go, along with some more details around what went wrong with editing /etc/passwd in the first place. It turns out they did have a backup of /etc/passwd, but since they couldn't log into the machine at all, they were unable to copy that saved file.
Indeed, the best time to ensure your machine is backed up is before a disaster strikes. Run through this checklist:
- Do you have a current viosbr?
- Have you run backupios?
- Do you have a current accessible mksysb of your VIO server? Do you have current mksysbs of your LPARs? Do you have a local Alt Disk Copy of rootvg?
- Is your HMC backup current? Do you have a mksysb of your NIM server?
- If you take backups, that's great. But have you tested them? Are they accessible if your computer room burns down? I wrote about this more than 10 years ago, yet here we are, still needing to backup our machines and still needing to know how to restore them.
We're all busy, but it's essential to take time now to figure out how you can recover your systems.
Posted November 15, 2016 | Permalink