Friday, November 30, 2018

VMWare ESXI Gripes: How not to set a ESXi host in maintenance mode

So I want to patch a ESXi host (it is by itself instead of part of a cluster). First thing I do is to tell it to go to maintenance mode:

[root@vmhost2:~] esxcli system maintenanceMode set --enable on


[crickets]

and Then I wait and wait and wait and wait some more. And then 35 minutes passed since I started that and nothing seems to be happening, which is why I decided to take the time while waiting and start typing this article. It does feel like it is trying to pause all the running guests in some convoluted way (more on that later). I checked the Place a Host in Maintenance Mode chapter in the official ESXi and vCenter docs, and this is what it says about this:

The host is in a state of Entering Maintenance Mode until all running virtual machines are powered down or migrated to different hosts.

Lovely. But, this is taking way too long. I mean, the script I wrote to save/suspend and resume ESXi guests is much faster than whatever the maintenance mode is doing. Maybe I should have done that first.

It is now almost an hour and it still have not finished doing whatever it is trying to do. So, I got annoyed:

[root@vmhost2:~] /etc/init.d/hostd restart
watchdog-hostd: Terminating watchdog process with PID 66942
hostd stopped.
hostd started.
[root@vmhost2:~] /etc/init.d/vpxa  restart
watchdog-vpxa: Terminating watchdog process with PID 67479
vpxa stopped.
[root@vmhost2:~] /etc/init.d/vpxa  status
vpxa is running
[root@vmhost2:~] vim-cmd /hostsvc/maintenance_mode_exit
The operation is not allowed in the current state.
[root@vmhost2:~]

See how it is giving me a hard time? I can play that game: time to unleash my script:

[root@vmhost2:/vmfs/volumes/5b7a0d7d-7f2b6678-68c8-00224d98ad4f/var/tmp] ./save
_runningvms.sh save
Suspending VM...
Suspending VM...
Suspending VM...
Suspending VM...
Suspending VM...
Suspending VM...
[root@vmhost2:/vmfs/volumes/5b7a0d7d-7f2b6678-68c8-00224d98ad4f/var/tmp]

and now I know I can safely do a Windows on this vm host:

[root@vmhost2:~] reboot
[root@vmhost2:~]

And now the vms are saved, I have no problem whatsoever putting this host in maintenance more. In fact, doing

[root@vmhost2:~] esxcli system maintenanceMode set --enable on
[root@vmhost2:~] 

took longer to type than to run it.

Moral of the Story

Save/quit/whatever your guests before setting maintenance mode. At least if you do not have a way to move the guests to the other nodes (if you have any), like vmotion.