ESXI 6.7 client creating Thick Eager zeroed vmdk files using ceph fsal
by Robert Toole
Hi,
I have a 3 node Ceph octopus 15.2.7 cluster running on fully up to date
Centos 7 with nfs-ganesha 3.5.
After following the Ceph install guide
https://docs.ceph.com/en/octopus/cephadm/install/#deploying-nfs-ganesha
I am able to create a NFS 4.1 Datastore in vmware using the ip address
of all three nodes. Everything appears to work OK..
The issue however is that for some reason esxi is creating thick
provisioned eager zeroed disks instead of thin provisioned disks on this
datastore, whether I am migrating, cloning, or creating new vms. Even
running vmkfstools -i disk.vmdk -d thin thin_disk.vmdk still results in
a thick eager zeroed vmdk file.
This should not be possible on an NFS datastore, because vmware requires
a VAAI NAS plugin to accomplish thick provisioning over NFS before it
can thick provision disks.
Linux clients to the same datastore can create thin qcow2 images, and
when looking at the images created by esxi from the linux hosts you can
see that the vmdks are indeed thick:
ls -lsh
total 81G
512 -rw-r--r--. 1 root root 230 Mar 25 15:17 test_vm-2221e939.hlog
40G -rw-------. 1 root root 40G Mar 25 15:17 test_vm-flat.vmdk
40G -rw-------. 1 root root 40G Mar 25 15:56 test_vm_thin-flat.vmdk
512 -rw-------. 1 root root 501 Mar 25 15:57 test_vm_thin.vmdk
512 -rw-------. 1 root root 473 Mar 25 15:17 test_vm.vmdk
0 -rw-r--r--. 1 root root 0 Jan 6 1970 test_vm.vmsd
2.0K -rwxr-xr-x. 1 root root 2.0K Mar 25 15:17 test_vm.vmx
but the qcow2 files from the linux hosts are thin as one would expect:
qemu-img create -f qcow2 big_disk_2.img 500G
ls -lsh
total 401K
200K -rw-r--r--. 1 root root 200K Mar 25 15:47 big_disk_2.img
200K -rw-r--r--. 1 root root 200K Mar 25 15:44 big_disk.img
512 drwxr-xr-x. 2 root root 81G Mar 25 15:57 test_vm
These ls -lsh results are the same from esx, linux nfs clients and from
cephfs kernel client.
What is happening here? Are there undocumented VAAI features in
nfs-ganesha with the cephfs fsal ? If so, how do I turn them off ? I
want thin provisioned disks.
ceph nfs export ls dev-nfs-cluster --detailed
[
{
"export_id": 1,
"path": "/Development-Datastore",
"cluster_id": "dev-nfs-cluster",
"pseudo": "/Development-Datastore",
"access_type": "RW",
"squash": "no_root_squash",
"security_label": true,
"protocols": [
4
],
"transports": [
"TCP"
],
"fsal": {
"name": "CEPH",
"user_id": "dev-nfs-cluster1",
"fs_name": "dev_cephfs_vol",
"sec_label_xattr": ""
},
"clients": []
}
]
rpm -qa | grep ganesha
nfs-ganesha-ceph-3.5-1.el7.x86_64
nfs-ganesha-rados-grace-3.5-1.el7.x86_64
nfs-ganesha-rados-urls-3.5-1.el7.x86_64
nfs-ganesha-3.5-1.el7.x86_64
centos-release-nfs-ganesha30-1.0-2.el7.centos.noarch
rpm -qa | grep ceph
python3-cephfs-15.2.7-0.el7.x86_64
nfs-ganesha-ceph-3.5-1.el7.x86_64
python3-ceph-argparse-15.2.7-0.el7.x86_64
python3-ceph-common-15.2.7-0.el7.x86_64
cephadm-15.2.7-0.el7.x86_64
libcephfs2-15.2.7-0.el7.x86_64
ceph-common-15.2.7-0.el7.x86_64
ceph -v
ceph version 15.2.7 (<ceph_uuid>) octopus (stable)
The ceph cluster is healthy using bluestore on raw 3.84TB sata 7200 rpm
disks.
--
Robert Toole
rtoole(a)tooleweb.ca
403 368 5680
4 weeks
Announce Push of V5.4 - upgrade to this!
by Frank Filz
Important: This I believe finally solves the issues with the entry cache
growing unbounded. Please upgrade to this release and start intensive
testing if possible.
Branch next
Tag:V5.4
Merge Highlights
* Improve directory chunk LRU with a low water mark and fix interval bug
* Remove all entries from mdcache LRU with elevated refcount
* NFSv3 Create opens an fd, need to count those
* FSAL_VFS: Don't attempt to get ACLs from symbolic links
Signed-off-by: Frank S. Filz <ffilzlnx(a)mindspring.com>
Contents:
4f8c484b4 Frank S. Filz V5.4
3a395315f Martin Schwenke FSAL_VFS: Don't attempt to get ACLs from symbolic
links
d59ff1476 Frank S. Filz FSALs when we do NFSv3 create and open global fd, we
must LRU track it
1877b3346 Frank S. Filz MDCACHE: remove all active entries from LRU L1 and
L2
c25e00b59 Frank S. Filz MDCACHE: We need to reap chunks and implement a low
water mark
29fed5c1d Frank S. Filz MDCACHE: Fix run interval for chunk lru
1 year, 6 months
Announce Push of V5.3.3
by Frank Filz
Branch next
Tag:V5.3.3
Merge Highlights
* Disable GPFS fsal async block lock support
* Add options to cmake to use spectrum's libwbclient.so library.
* Change RPC_Max_Connections max value to 1,000,000
* fix to use credential info from rpc if group resolution fails.
* Avoids parallel allocation with multi-threaded access
* PSEUDO_FSAL: Fix access to non-initialized structs in handle
* nfs4_op_exchange_id: pnfs server flags should not be empty
* mdcache bloating due to unreleased refcount during readdir
Signed-off-by: Frank S. Filz <ffilzlnx(a)mindspring.com>
Contents:
566310011 Frank S. Filz V5.3.3
432785ec4 Deepak Arumugam Sankara Subramanian mdcache bloating due to
unreleased refcount during readdir
a48cfbd5f Assaf Yaari nfs4_op_exchange_id: pnfs server flags should not be
empty
7306defe9 Shahar Hochma PSEUDO_FSAL: Fix access to non-initialized structs
in handle
c7b873ddf Rojin George Avoids parallel allocation with multi-threaded access
ce7875d69 Yogendra Charya fix to use credential info from rpc if group
resolution fails.
23900c389 Prabhu Murumesan Change RPC_Max_Connections max value to 1,000,000
da4245a97 Malahal Naineni Add options to cmake to use spectrum's
libwbclient.so library.
ac2ec85d5 Malahal Naineni Disable GPFS fsal async block lock support
1 year, 6 months