New subject: nfs-ganesha clients freeze, nfs ganesha daemon dies

Saturday, 17 August 2019

Hello. I'm statring a new thread for this problem.

I have a 3x3 Gluster Volume and I'm trying to use Ganesha for NFS
services.

One of the 9 server nodes, I have enabled Ganesha NFS server on one.

The volume is being used being host clients with NFS roots.

A long separate thread shows how I got to this point but what works on
the client side is:

RHEL 7.6 aarch64 4.14.0-115.el7a.aarch64
RHEL 7.6 x86_64 3.10.0-957.el7.x86_64

OverlayFS - NFS v4 underdir with TMPFS overlay.

The Ganesha server has
	Allow_Numeric_Owners = True;
	Only_Numeric_Owners = True;
        Disable_ACL = TRUE;

Disable_ACL is required for the aarch64 overlay to properly read
non-root files. (However, Disable_ACL must be false for aarch64
if you are using NFS v3 strangely).

The x86_64 node fully boots through full init/systemd startup to the
login prompt.

When I startup the aarch64 node, it gets various degrees of done... then
both NFS clients freeze up 100%.

Restarting nfs-ganesha gets them going for a moment, then they freeze
again. It turned out in some cases the nfs-ganesha daemon was present
during the freeze but no longer serving the nodes. However, a more
common case (and the captured one) is nfs-ganesha is gone.

I will attach a tarball with a bunch of information on the problem
including the config file I used, debugging logs, and some traces.

Ganesha 2.8.2
 - Ganesha, Gluster servers x86_64

Since the aarch64 node causes Ganesha to crash early, and the debug
log can get to 2GB quickly, I set up a test case as follows:

Tracing starts...
 - x86_64 fully nfs-root-booted, it comes up fine.
    * Actively using nfs for root during tests below
 - aarch64 node - boot to the miniroot env (a "fat" initrd that has
   more tools and from which we do the NFS mount)
 - It stops before switching control to the init start to run the tests
   like below.
 - cp'd /dev/null to ganesha log here
 - started the tcpdump to the problem node
 - Ran the following. Ganesha died at 'wc -l', also notice the
   Input/Output error on the first attempt:

bash-4.2# bash reset4.sh
+ umount /a
umount: /a: not mounted
+ umount /root_ro_nfs
umount: /root_ro_nfs: not mounted
+ umount /rootfs.rw
+ mount -o ro,nolock
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel /root_ro_nfs
+ mount -t tmpfs -o mpol=interleave tmpfs /rootfs.rw
+ mkdir /rootfs.rw/upperdir
+ mkdir /rootfs.rw/work
+ mount -t overlay overlay -o
lowerdir=/root_ro_nfs,upperdir=/rootfs.rw/upperdir,workdir=/rootfs.rw/work /a
bash-4.2# chroot /a
chroot: failed to run command '/bin/sh': Input/output error
bash-4.2# chroot /a
sh: no job control in this shell
sh-4.2# ls /usr/bin|wc -l

- When the above froze and ganesha died, I stopped tcpdump and collected
  the pieces in to a tarball.

See attached.

Erik

2025

2024

2023

2022

2021

2020

2019

2018

nfs-ganesha clients freeze, nfs ganesha daemon dies