Sunday, February 23, 2014

VMware ESXi: the unexpected effects of swapping boot keys,

It all happened because of booting ESXi from an USB flash. 
Sometimes things go on the entropy side in my lab and a host is booted from the wrong stick, in that case ESXi interprets this as a new installation (you notice it because it demands to insert the license code.

The standard behavior of ESXi is to create a persistent scratch space whenever the host is booted for the first time.
If the host uses previously formatted volumes (as is the case for me) it will lock the first disk device it finds because it uses it for the scratch partition.

The result is that the entire device is locked (no more vmfs partitions can be created) and the existing datastore cannot be deleted, indeed trying will return the following error:

"Cannot remove datastore ‘Datastore Name: * VMFS uuid: *’ because file system is busy. Correct the problem and retry the operation."

In this case move the scratch space somewhere else, fortunately it's quite easy to do that albeit it requires to reboot the ESXi host.

Where is the scratch space?
On the ESXi console (read here on how to activate it and the SSH service) check the content of:

/etc/vmware/locker.conf

The file should render something similar as:

/vmfs/volumes/510eb9e2-3772226f-db9b-001aa032346a/.locker 0

The UUID of yours would be obviously different, check if it corresponds to the datastore to unmount:

esxcli storage filesystem list

or use:

df

Filesystem        Bytes       Used    Available Use% Mounted on
VMFS-5     343328948224 1026555904 342302392320   0% /vmfs/volumes/DS_1
vfat          261853184  136474624    125378560  52% /vmfs/volumes/059b62ef-4b7ea870-d01b-603d7ae96396
vfat          261853184       8192    261844992   0% /vmfs/volumes/6e4e0fef-ed0fc28a-15d1-d8dcc0b12819
vfat          299712512  211755008     87957504  71% /vmfs/volumes/51179227-ea8b1764-c1c1-001aa0322571

To check all mounted volumes and their UUIDs.

Then create a partition for an alternative persistent scratch space (I crete it on the USB key itself, with a 4GB key there is enough space for most situations). In our example it is the partition with the UUID: 6e4e0fef-ed0… Different systems different UIDs.

mkdir /vmfs/volumes/6e4e0fef-ed0fc28a-15d1-d8dcc0b12819/.locker-Hostname

N.B.: Scratch partitions can reside on FAT, VMFS and NFS file systems. The new scratch directory can be created in the vSphere client as well, just use the Storage browser provided.

Now, either use the vSphere client to:
Choose the ESXi host
Configuration Tab -> Software -> Advanced Settings
ScratchConfig -> copy the full path to the directory you just created.
Reboot the ESXi host

Or use the ESXi CLI this way:

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/6e4e0fef-ed0fc28a-15d1-d8dcc0b12819/.locker-Hostname

and reboot the ESXi host.

Done! The old datastore can now be deleted and the disk is unlocked.

Side note: what is the scratch space?
It is a persistent location available for storing temporary data including logs, diagnostic information, and system swap. It is not a required feature, in fact ESXi can store the data on ramdisk for the time it's runnig, it is however a best practice not to consume memory for such tasks and to be able to recovery logs across reboots.

No comments:

Post a Comment