Hello. I'm statring a new thread for this problem.
I have a 3x3 Gluster Volume and I'm trying to use Ganesha for NFS
services.
One of the 9 server nodes, I have enabled Ganesha NFS server on one.
The volume is being used being host clients with NFS roots.
A long separate thread shows how I got to this point but what works on
the client side is:
RHEL 7.6 aarch64 4.14.0-115.el7a.aarch64
RHEL 7.6 x86_64 3.10.0-957.el7.x86_64
OverlayFS - NFS v4 underdir with TMPFS overlay.
The Ganesha server has
Allow_Numeric_Owners = True;
Only_Numeric_Owners = True;
Disable_ACL = TRUE;
Disable_ACL is required for the aarch64 overlay to properly read
non-root files. (However, Disable_ACL must be false for aarch64
if you are using NFS v3 strangely).
The x86_64 node fully boots through full init/systemd startup to the
login prompt.
When I startup the aarch64 node, it gets various degrees of done... then
both NFS clients freeze up 100%.
Restarting nfs-ganesha gets them going for a moment, then they freeze
again. It turned out in some cases the nfs-ganesha daemon was present
during the freeze but no longer serving the nodes. However, a more
common case (and the captured one) is nfs-ganesha is gone.
I will attach a tarball with a bunch of information on the problem
including the config file I used, debugging logs, and some traces.
Ganesha 2.8.2
- Ganesha, Gluster servers x86_64
Since the aarch64 node causes Ganesha to crash early, and the debug
log can get to 2GB quickly, I set up a test case as follows:
Tracing starts...
- x86_64 fully nfs-root-booted, it comes up fine.
* Actively using nfs for root during tests below
- aarch64 node - boot to the miniroot env (a "fat" initrd that has
more tools and from which we do the NFS mount)
- It stops before switching control to the init start to run the tests
like below.
- cp'd /dev/null to ganesha log here
- started the tcpdump to the problem node
- Ran the following. Ganesha died at 'wc -l', also notice the
Input/Output error on the first attempt:
bash-4.2# bash reset4.sh
+ umount /a
umount: /a: not mounted
+ umount /root_ro_nfs
umount: /root_ro_nfs: not mounted
+ umount /rootfs.rw
+ mount -o ro,nolock
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel /root_ro_nfs
+ mount -t tmpfs -o mpol=interleave tmpfs /rootfs.rw
+ mkdir /rootfs.rw/upperdir
+ mkdir /rootfs.rw/work
+ mount -t overlay overlay -o
lowerdir=/root_ro_nfs,upperdir=/rootfs.rw/upperdir,workdir=/rootfs.rw/work /a
bash-4.2# chroot /a
chroot: failed to run command '/bin/sh': Input/output error
bash-4.2# chroot /a
sh: no job control in this shell
sh-4.2# ls /usr/bin|wc -l
- When the above froze and ganesha died, I stopped tcpdump and collected
the pieces in to a tarball.
See attached.
Erik