ESXi6.7u3 (iSCSI, RDM & remove datastore)

Post by **admin** » 12 Apr 2021, 14:17

3 days before license of ESXi7.0 expires, build a new USB (32GB) with ESXi6.7u3

I got lost.
Forgot to backup the old configuration, had to re-create all virtual networks again (5).
After rebuild of ESXi USB bootable device I am not able to reconnect to iSCSI datastore.
Last Saturday ESXi did connect and I saw the iSCSI LUN.
On Sunday even the iSCSI LUN wont work anymore.

Rescan on ESXi does nothing.
Changing CHAP settings for iSCSI says: Successfully changed, but noting is changed!

Errors on changing CHAP settings:

Code: Select all

Update Internet Scsi Authentication Properties Key
haTask-ha-host-vim.host.StorageSystem.updateInternetScsiAuthenticationProperties-2128410307

Description
Updates the authentication properties for an Internet SCSI host bus adapter

State
Failed - A general system error occurred: Errors

Not much more information.

Updated ESXi6.7u3 to latest level with patches.

After several tries, reset the iSCSI by command line:
https://www.reddit.com/r/sysadmin/comme ... _settings/

Make sure only use 1 of the 2 rules (!) uni or mutual CHAP.

Remount the found datastore:
https://serverfault.com/questions/59400 ... k-faillure

Code: Select all

esxcfg-volume -l
esxcfg-volume -m <datastorename>

DG.

Post by **admin** » 03 May 2021, 12:50

This did not work for the next reboot.
Had to manual add it again.

Also see that the iSCSI info is different from SSD info for VMFS:

iSCSI, type Unknown ? :

SSD:

There is no hardware change for both datastores.
As we running out of storage, we planning of making a new iSCSI, 2TB.
Move all from iSCSI 1TB to 2TB then delete iSCSI 1TB.

DG.

Post by **admin** » 04 May 2021, 12:49

Just before the last disk failure, I was thinking of increase iSCSI 1TB to 2TB.
Not sure if the VMFS datastore will / can understand that, so we have to do a test first.

1 - Make 400GB iSCSI & mount to ESXi as datastore VMFS.
2 - Check if there are differences with the current one, which says 'Unknown' as type ? :

If there is no difference, it should be ok.
Could be due to 8MB blocksize instead of 1MB for datastores on SSD's.

3 - Put data (or better, a VM) on datastore.
4 - Now expand the iSCSI (most probably not if connected, so the VM's on that datastore should all be OFF.)
5 - Mount the expanded datastore to ESXi, and test if the VM's still their and working.
6 - Must the datastore being expanded before use of new capacity, due to VMFS format?
7 - Check if VM's still work after expansion.

If all works fine, we could delete this test datastore iSCSI and perform expansion of the current iSCSI datastore.

DG.

Post by **admin** » 04 May 2021, 12:50

(1) The new created datastore has VMFS as type, and not Unknown, as the older iSCSI device.

(2) Found a difference, the old iSCSI is mbr formatted, this new one gpt

Also the block-size difference most probably according these settings.

So we continue our test, to see if this new DS can be used & expanded, without loosing the VM's.

(3) Moved a VM to new DS and testing the Windows8 x64 machine (with vSphere Client & VM Converter).
Graph shows the VM moved (from iSCSI to iSCSI, 20MB/s). Not as fas as I hoped, but could be due to remotely working.
And afterwards starting -after several years- so updates were awaiting.

As we see good working windows8 with the VM converter, we have to make a copy of this VM (first).
-update- Can be removed, has no license and shuts down every hour.

(4) After increase of iSCSI we were able to expand the re-mounted DS (5), by using increase capacity on a DS.

Select Expand an existing VMFS datastore extent

(6) Next select the DS, then a graphical overview of the DS is shown.
When you click the left (blue) DS your get a slider to expand the DS.

(You can get a warning message that existing VM's can be lost!)

When we retry this the next day we face the first real (browser) error:

(Could be because the tested DS -here- is also used as ESXi cache.)

So iSCSI DS is not automatically extended with iSCSI. (6)
(5) The DS must first be re-mounted (or a new DS is made over existing DS !!)
Next the DS has to be extended.

(7) Yes, the VM's were still there and usable !!

We made iSCSI 2TB and copy all data from old 1TB DS to the new 2TB DS.
Renamed the DS afterwards and check if all VM's are usable.

Afterwards the copied (or moved VM's) need to re-register, and have to unregister them first.
But always select MOVE if asked it the VM is copied or moved, if really copied, also rename it's computer-name! (new MAC)
This keeps the current MAC addresses.

We also found VMFS5 v5.61 (datastore02) & VMFS6 v6.82 (datastore01), and only the just newly created iSCSI VMFS6 is v6.82 (datastore01)

So all other DS's need transformation to VMFS6 (v6.82) ?
This can only be done by re-creating the complete VMFS !
Make right preparations before doing this, like enough other DS storage to temporarly store your data (VM).
And make sure all registered VM's point to the right datastore (re-register after move / convert) !!

The old 1TB iSCSI could do the job, after being removed and re-created, as this one is old (v5.61) and has mbr as format and should be gpt.
After recreation of newly VMFS, make new DS for move of SSD-data, and perform the same on those DS's.

DG.

Post by **admin** » 04 May 2021, 15:58

We have a VM which has an extra (S)ATA disk, direct attached (RDM).
When you move this machine it will copy all data of the direct attached data too, which we don't want!
https://kb.vmware.com/s/article/1005241

Started a move which would take 11 hours?
How to stop / kill such kind of job, a running process within ESXi?

Found a solution on:
https://serverfault.com/questions/88925 ... -interface
Enable SSH, log into it, and execute:

Code: Select all

/etc/init.d/hostd restart

You web-GUI will also be reset.

We connect this extra disk back to the VM, later on, so that data should not be touched.

Steps to take for such VM:

1 - Power down the VM.
2 - Remove the direct attached disk.

Hard disk 2, in the picture below should be removed, but keep those settings to place back later!
This one also uses LSI Logic SAS Controller, leave that one in the VM.

3 - Move the VM (without attached vmdk file).
4 - Attach the disk to the VM.
5 - Test VM.

What has been done:

(1) - Power down the VM. Done
(2) - Remove the direct attached disk. Done
It will even ask to delete the data from the datastore (don't click on that one, or the attached disk will get empty.)

But we still see the .vmdk file of 1TB within the datastore of this VM, weird.

Should we remove that one? Will we keep the data? I don't know.
Within the VM settings the disk is gone.

As we want to use the datastore browser move function to move this VM, we'll check on CLI first.

Datastore04 itself is 480GB. with one file of 931.5GB inside?

Leave the file SATA_RDM_1.vmdk as this is only the pointing to the extra disk. (Can be removed too.)
Delete the 931.5G vmdk file, SATA_RDM_1-rdmp.vmdk.
And we'll see if the 'real' data will be gone.
Now MOVE the VM-folder to other datastore.

Real data is NOT gone, but the RDM (Raw Drive Mapping) disk should be re-created on the new location.
Also a CLI copy command (to other datastore) would have copied the data of the additional disk, I presume.
As the move of the VM from web-GUI was doing.

SATA_RDM_1.vmdk & SATA_RDM_1-rdmp.vmdk - Are the 'mapping's that should be re-created.

How did we create this RDM in the first place?

Found it again:
https://gist.github.com/Hengjie/1520114 ... 7af4b3a064

You have to create drive mapping from the Physical disk to VM location.
Use of SSH and command: vmkfstools -z (--createrdmpassthru /vmfs/devices/disks/...)

In my case:

Code: Select all

vmkfstools -z /vmfs/devices/disks/t10.ATA_____WDC_WD10EADS2D00M2B0__________________________WD2DWMAV50262135 "/vmfs/volumes/datastore01/LDZM_203/SATA_RDM_1.vmdk"

This new rdm-drive (SATA_RDM_1.vmdk) needs to be added by editing the VM settings (add existing hard disk), before power on the VM.
This VM was running on SSD (VMFS5) and runs now on iSCSI (VMFS6) with its ATA disk direct attached.

Next we can re-create VMFS6 on the SSD (and copy back again , LOL )

DG.

Post by **admin** » 06 May 2021, 17:07

Removing the old datastore04 will not work (yet), it keeps kinda busy or in use ?

We did have the swap location also on datastore04, so removing it will be a problem.
Searched the internet how to solve without rebooting ESXi, but all do not work.

We deleted all files from it (except .dds.sf, which can not be deleted, is used by VMFS).

swap:
Stop swap, change swap's location, start swap.
All settings get back to their original, although web-GUI returns successfully changed settings (... NOT).
If you refresh, the old values are still there, not the new ones.

Syslog server:
ESXi keeps telling me that datastore04 is in use.
Restarting (or change policy), does not have any effect, all old settings (also advanced settings) are set back to original values.

It gets even stranger. Searching back for some errors I found this:

hahaha, "The configured log directory /scratch/log cannot be used. The default directory /scratch/log will be used instead", now I'm sure to reboot the host ESXi ... Changed too many things now.

Only a full restart of the ESXi server will be the (only) solution, I think, but that is also this server on which we work right now.
It is running on this ESXi machine ... we have to plan this ...

An overview of sites checked, and the result:

-1- title: VMware – Can’t unmount/remove datastore – in use or file system is busy
https://tomaskalabis.com/wordpress/vmwa ... m-is-busy/
Result:
Nothing on last command: lsof | grep <UUID-of-datastore>
But still datastore04 is in use (or busy, whatever...)

-2- title: How to change Syslog.global.LogDir on ESXi 6.5 via CLI
https://tomaskalabis.com/wordpress/how- ... 5-via-cli/
Result:
It looks like it did the job with command: esxcli system syslog config get, but: ls -al /scratch returns still the old datastore for .locker ?
datastore04 is still in use ...

-3- title: Steps to fix unable to unmount/delete VMFS Datastore: the resource is in use
https://bobcares.com/blog/steps-to-fix- ... is-in-use/
Result:
Failed to change Syslog.global.logDir - dismiss
Required adjustment can not be made.
datastore04 is still in use ...

-4-
https://communities.vmware.com/t5/VMwar ... -p/2791968
Result:
Could be handy, but also too old.
First check after ESXi reboot

We plan a ESXi reboot asap...

DG.

Post by **admin** » 06 May 2021, 20:08

After an ESXi reboot ...

- Still not able to delete datastore04, grrrr It even has a new .locker directory, why ??
- Missing 1 of 2 iSCSI datastores (from the same source ???) and can't get it back online, why ???

It's getting annoying

Even in mind to start over and re-install ESXi 6.7
Or even back to 5.5 --from 7.0, which even was worse-- then we can use vSphere Client again ...)
But with all patches? That will take at leased one day, re-make all networks etc, no we stay at 6.7

First try: fix why 1 iSCSI datastore(02a) is now missing from inventory.
It is from the same host as datastore01, with the same chap-info.
I can add the static name into the web-gui iSCSI settings, but within the logs it only says: login failed.
Looks the same as at the start of this whole history, only now the iSCSI-datastore is gpt format and with VMFS6 and still this one won't connect at ESXi boot. Why the new (bigger) datastore (also iSCSI, same host, same chap info) IS connected at ESXi boot ?

Re-start iSCSI from host side does nothing, afterwards the same situation resist, 1 datastore online and active, 1 datastore offline inactive, from the same host. On which almost no logging is seen, but I can see the one connected, the other is not.
Both iSCSI datastores were fine before the ESXi reboot.
So weird, the same as that I still can NOT remove an empty datastore (SSD). (should I reboot without the SATA drive attached?)
Within ESXi5.5 (with vSphere Client) these issues were lots less.

Just because we can do, we do it again. Reboot ESXi host.

DG.
The older we get, the more issues within the software (or OS).

Post by **admin** » 06 May 2021, 23:20

Just before the second reboot, on CLI performed some reset to mutual chap for iSCSI then checked the iSCSI settings on ESXi.
Opened the mutual chap settings, I see an empty name field. filled the right name, save config.
And now it just add the datastore02a ?

Already saw this earlier and filled in the name, but nothing happened then.
No idea why.

Just for the fun, we were planned another reboot, we DO another reboot .
Also to test if (both) iSCSI-datastores stays alive.

As I thought, now don't have any iSCSI anymore... I can get worse ... What the F...
Can we trust iSCSI ? I'm starting to wonder.
Why is this happening? Over 4 years no problems with 1 (old, VMFS5, Unknown type, mbr format) iSCSI datastore.
Now, with all newest version of all, hell breaks loose.

Most of my VM's offline, the datastore is gone

And now to restore that again, and again, and again, after every reboot?
I get really lost now.

...

Ok, checked all from within ESXI cli. And a little bit of the gui.

After checking the target iSCSI (we do have the connection), from the ESXi-cli to the iSCSI host (ix4).

Code: Select all

esxcli iscsi adapter target portal auth chap get  -A vmhba64 -a <IP_address>:<port> -n <iSCSI-name>

We get the result:

Code: Select all

<iSCSI-name>
   Address: <IP_address>:<port>
   TargetName: <iSCSI-name>
   Method: chap
   Direction: uni
   Name: <iSCSI_login_name>
   Level: required
   Inheritance: true
   Parent: <IP_address>:<port>

We found that although our host is set to mutual, it's actual uni-chap is requesting, for both iSCSI luns.
That could be our Colpitts.

Changed (back) the host -default- value to uni instead of mutual (was earlier set to mutual, as the host is set so?):

Code: Select all

esxcli iscsi adapter auth chap set -A vmhba64 --direction=uni --default

Next make the connection:

Code: Select all

esxcli iscsi adapter auth chap set -A vmhba64 -N <iSCSI_login_name> --direction=uni --level=required --secret

[Enter the right password.]

Within the gui the datastores return within minutes.

Next, again, test a reboot.
These iSCSI should ALWAYS be available.

DG.

Post by **admin** » 07 May 2021, 01:02

That's more like it.
iSCSI 'back on track', for both datastores.

Looks like the change to mutual within ESXi was killing my iSCSI.
So far so good.
Next to remove datastore04 (, further).
CLI will bring more perspective then the web-gui

DG.

Post by **admin** » 07 May 2021, 17:42

Fixed the iSCSI, next to remove a busy (in use) datastore(04).

All settings should get back to datastore04, after it is transformed to VMFS6.
- /scratch
- ESXi swap

Changed swap location for ESXi, was on datastore04.
Found on internet some fixes for changing the format of the disk, but as it is in use now, doubt it will work (or crash ESXi?)

https://bogdanburuiana.com/index.php/20 ... iguration/
https://www.hull1.com/fixit/2020/08/13/ ... e-fix.html

Better use the following:

To remove datastore04, we have to change the /scratch settings of ESXi:
https://youtu.be/OF3fIJKaKqs

The following steps have to be done:

1 - Set the temp-location to /tmp/scratch, as in the example?

2 - This MUST be done in Maintenance-mode, so all VM's have to be shut down (again).
His last command restarts ALL running processes from /etc/init.d/
(Kinda same as a reboot of the host.)

3 - Try remove now datastore04

2 options: Y or N
Did it work?

Y ? -> step 4
N ? _> step 7

4 - Recreate new VMFS6 on whole SSD.
5 - All ok, do the same trick again with the old/new values

6 - You can check a reboot back into maintenance mode, or exit maintenance mode & reboot (to test autostart VM's).
You should be finished here.

7 - Search further ... duckduckgo, google, techwebs.
--

Commands used:

Code: Select all

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /tmp/scratch
ln -sfnv /tmp/scratch /scratch
( for s in /etc/init.d/*; do echo $s; $s status && $s restart; done )

remove datastore04
recreate new datastore04 (VMFS6) whole drive.

Check and get its new UUID: 60949bd8-ed1392ea-b423-cc52af415d5f

Old one was:
/vmfs/volumes/5e908261-0d789052-fdc1-cc52af415d5f/.locker
New value will be:
/vmfs/volumes/60949bd8-ed1392ea-b423-cc52af415d5f/.locker

Set /scratch same way back to new VMFS6 datastore04:

Code: Select all

mkdir /vmfs/volumes/60949bd8-ed1392ea-b423-cc52af415d5f/.locker
vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/60949bd8-ed1392ea-b423-cc52af415d5f/.locker
ln -sfnv /vmfs/volumes/60949bd8-ed1392ea-b423-cc52af415d5f/.locker /scratch

Looks like we dont need a restart of all services? The gui already give both values as new.
But we still restart all services:

Code: Select all

( for s in /etc/init.d/*; do echo $s; $s status && $s restart; done )

Exit maintenance mode

- Moved DGWEB to datastore04 (Done), LDZM need special attention, as before (mounted extra RDM drive).
- Start swap to datastore04. (Done)

Check later if the directory on datastore01/.locker should point to /scratch/log ? The syslog is now there ?
Can not be changed within webgui :

Code: Select all

Update Options
Key		haTask-ha-host-vim.option.OptionManager.updateValues-377561991
Description	Updates one or more properties
State		Failed - A specified parameter was not correct:
Errors

That bring not much ...
This we have also mixed up last few days.

DG.

DG's Hardware board

ESXi6.7u3 (iSCSI, RDM & remove datastore)

ESXi6.7u3 (iSCSI, RDM & remove datastore)

ESXi6.7u3 & iSCSI

Re: ESXi6.7u3 & iSCSI

Re: ESXi6.7u3 & iSCSI

Re: ESXi6.7u3 & iSCSI - VM & RDM disk

Re: ESXi6.7u3 - Remove & rebuild datastore to VMFS6

Re: ESXi6.7u3 - Remove & rebuild datastore to VMFS6 (iSCSI again?)

Re: ESXi6.7u3 - iSCSI issues

Re: ESXi6.7u3 - iSCSI issues

Re: ESXi6.7u3 - Remove & rebuild datastore to VMFS6