ESXI 6.7 client creating Thick Eager zeroed vmdk files using ceph fsal
by Robert Toole
Hi,
I have a 3 node Ceph octopus 15.2.7 cluster running on fully up to date
Centos 7 with nfs-ganesha 3.5.
After following the Ceph install guide
https://docs.ceph.com/en/octopus/cephadm/install/#deploying-nfs-ganesha
I am able to create a NFS 4.1 Datastore in vmware using the ip address
of all three nodes. Everything appears to work OK..
The issue however is that for some reason esxi is creating thick
provisioned eager zeroed disks instead of thin provisioned disks on this
datastore, whether I am migrating, cloning, or creating new vms. Even
running vmkfstools -i disk.vmdk -d thin thin_disk.vmdk still results in
a thick eager zeroed vmdk file.
This should not be possible on an NFS datastore, because vmware requires
a VAAI NAS plugin to accomplish thick provisioning over NFS before it
can thick provision disks.
Linux clients to the same datastore can create thin qcow2 images, and
when looking at the images created by esxi from the linux hosts you can
see that the vmdks are indeed thick:
ls -lsh
total 81G
512 -rw-r--r--. 1 root root 230 Mar 25 15:17 test_vm-2221e939.hlog
40G -rw-------. 1 root root 40G Mar 25 15:17 test_vm-flat.vmdk
40G -rw-------. 1 root root 40G Mar 25 15:56 test_vm_thin-flat.vmdk
512 -rw-------. 1 root root 501 Mar 25 15:57 test_vm_thin.vmdk
512 -rw-------. 1 root root 473 Mar 25 15:17 test_vm.vmdk
0 -rw-r--r--. 1 root root 0 Jan 6 1970 test_vm.vmsd
2.0K -rwxr-xr-x. 1 root root 2.0K Mar 25 15:17 test_vm.vmx
but the qcow2 files from the linux hosts are thin as one would expect:
qemu-img create -f qcow2 big_disk_2.img 500G
ls -lsh
total 401K
200K -rw-r--r--. 1 root root 200K Mar 25 15:47 big_disk_2.img
200K -rw-r--r--. 1 root root 200K Mar 25 15:44 big_disk.img
512 drwxr-xr-x. 2 root root 81G Mar 25 15:57 test_vm
These ls -lsh results are the same from esx, linux nfs clients and from
cephfs kernel client.
What is happening here? Are there undocumented VAAI features in
nfs-ganesha with the cephfs fsal ? If so, how do I turn them off ? I
want thin provisioned disks.
ceph nfs export ls dev-nfs-cluster --detailed
[
{
"export_id": 1,
"path": "/Development-Datastore",
"cluster_id": "dev-nfs-cluster",
"pseudo": "/Development-Datastore",
"access_type": "RW",
"squash": "no_root_squash",
"security_label": true,
"protocols": [
4
],
"transports": [
"TCP"
],
"fsal": {
"name": "CEPH",
"user_id": "dev-nfs-cluster1",
"fs_name": "dev_cephfs_vol",
"sec_label_xattr": ""
},
"clients": []
}
]
rpm -qa | grep ganesha
nfs-ganesha-ceph-3.5-1.el7.x86_64
nfs-ganesha-rados-grace-3.5-1.el7.x86_64
nfs-ganesha-rados-urls-3.5-1.el7.x86_64
nfs-ganesha-3.5-1.el7.x86_64
centos-release-nfs-ganesha30-1.0-2.el7.centos.noarch
rpm -qa | grep ceph
python3-cephfs-15.2.7-0.el7.x86_64
nfs-ganesha-ceph-3.5-1.el7.x86_64
python3-ceph-argparse-15.2.7-0.el7.x86_64
python3-ceph-common-15.2.7-0.el7.x86_64
cephadm-15.2.7-0.el7.x86_64
libcephfs2-15.2.7-0.el7.x86_64
ceph-common-15.2.7-0.el7.x86_64
ceph -v
ceph version 15.2.7 (<ceph_uuid>) octopus (stable)
The ceph cluster is healthy using bluestore on raw 3.84TB sata 7200 rpm
disks.
--
Robert Toole
rtoole(a)tooleweb.ca
403 368 5680
2 weeks, 2 days
Podman Rootless NFS / VFS
by mrow1109@gmail.com
Hey folks, looking at this thread I see in general VFS rootless is pretty much not possible. I went ahead and tried to launch a container with the root namespace capabilities and then provide from podman all the necessary capabilities to the container so ganesha could run with privilege. Of course I'm still having issues with the filehandle. That being said in the thread here: https://lists.nfs-ganesha.org/archives/list/support@lists.nfs-ganesha.org... there is a comment that I'm particularly interested in as it is in line with my specific use case and I was wondering if there is any more information on this?
"
We DO have a new capability to
build without that need (it doesn't work on MacOS) and if you only needed to export
files owned by a single user, it would be easy to setup Ganesha to do so, and if you
needed to be able to export to non-owner users, we could probably make that work,
especially if non-owners would be read-only. There are some complexities about writing to
files, but I think they can be ignored in this use case (there may be some quota
implications, but if an owner is allowing non-owners to expand files, the quota definitely
should be assigned to the owner anyway, so doing the writes as the owner would be fine).
But the only way to work around the CAP for open_by_handle is to add some kind of handle
mapping.
"
In my case I would theoretically like to run a rootless container and share out a list of directories that are owned by the same rootless user running the container. I'm wondering if this is possible at all?
Also just for the sake of including my debug information - config is a tad messy just trying to see if I can even ductape things together or not.
EXPORT_DEFAULTS {
## Access type for clients. Default is None, so some access must be
## given either here or in the export itself.
Access_Type = RW;
Attr_Expiration_Time =0;
}
NFS_CORE_PARAM {
fsid_device = true;
}
EXPORT
{
## Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 12345;
## Exported path (mandatory)
Path = /exports/;
## Pseudo Path (required for NFSv4 or if mount_path_pseudo = true)
Pseudo = /exports/;
## Restrict the protocols that may use this export. This cannot allow
## access that is denied in NFS_CORE_PARAM.
# I didnt have any issues include nfsv3 - i was actually failing on showmount -e to show /exports if i had only 4??
Protocols = 3,4;
## Access type for clients. Default is None, so some access must be
## given. It can be here, in the EXPORT_DEFAULTS, or in a CLIENT block
Access_Type = RW;
## Allowed security types for this export
Sectype = none;
Squash = All;
Anonymous_Uid = 0;
Anonymous_Gid = 0;
FSAL {
Name = VFS;
}
CLIENT {
Access_Type = RW;
}
}
## Configure logging. Default is to log to Syslog. Basic logging can also be
## configured from the command line
LOG {
## Default log level for all components
Default_Log_Level = INFO;
## Configure per-component log levels.
Components {
NFS4 = FULL_DEBUG;
CACHE_INODE = FULL_DEBUG;
EXPORT = FULL_DEBUG;
FSAL = FULL_DEBUG;
CONFIG = FULL_DEBUG;
}
## Where to log
Facility {
name = FILE;
destination = "/var/log/ganesha.log";
enable = active;
}
}
[root@4e8602ae7a6b /]# ganesha.nfsd -v
NFS-Ganesha Release = V4.0
run statement
exec /usr/bin/ganesha.nfsd -F -L /dev/stderr -f /run/ganesha/ganesha.conf -p /run/ganesha/ganesha.pid
[root@4e8602ae7a6b /]# showmount -e
Export list for 4e8602ae7a6b:
/exports (everyone)
on mount:
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] vfs_getattr2 :FSAL :F_DBG :Calling find_fd, state = NULL
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] vfs_open_by_handle :FSAL :F_DBG :vfs_fs = /exports root_fd = 4
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] vfs_open_by_handle :FSAL :M_DBG :Handle len 22 0x45: fsid=0x00000000000000fd.0x0000000000000000, type 0x81, opaque: (12:0xb590b00c0000000033232ebf)
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] vfs_open_by_handle :FSAL :DEBUG :Failed with Operation not permitted openflags 0x00000000
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] find_fd :FSAL :DEBUG :Failed with Operation not permitted openflags 0x00000020
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] vfs_getattr2 :FSAL :F_DBG :Got fd -1 closefd = false
27/09/2022 16:42:56 : epoch 633327a7 : 4e8602ae7a6b : ganesha.nfsd-12[svc_10] fsal_common_is_referral :FSAL :EVENT :Failed to get attrs for referral, handle: 0x564e2c093d70, valid_mask: 0, request_mask: 82, supported: 0, error: Forbidden action
Note on run path:
from root run capsh script to launch podman ganesha container
podman runs with capabilities all (as specific user)
process / capability tree
root - capsh user -podman run -> full capability set
user - podman run reduced set based on binary capabilities (modified from none to ensure root namespace capabilities are not lost) -
=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_setgid,cap_setuid,cap_net_bind_service,cap_net_admin
^^ I think I only need dac_read_search and maybe setgid/setuid however im just going overkill to make sure I have what I need
nfs ganesha process then only is running based on getpcaps in the container with all the above capabilities
2 years, 3 months
Restricting root's FS access
by Matthew Richardson
Hi,
I'm currently working to set up ganesha with kerberos. Everything seems to work as expected, except that I can't find a way to limit the access that root on the client has to the mounted filesystem.
At the moment I'm squashing root to 'nobody' - however that obviously still allows access to world-readable files/dirs. Is there a way to block all FS access from root/nobody, or always require a valid kerberos ticket?
Thanks,
Matthew
2 years, 4 months