OK. Here is what I did.
- Note: I have the test system setup in this test case so I can re-run
it easy no problems.
- I added a shellout in our 'miniroot' environment to give me a shell
right after it made the overlay mount to /a (/a is what becomes root).
- I did this *before* the "--move" lines we do to make switch_root happy
just in case (same result).
- I had it pause waiting for me to hit ENTER before doing the mount.
That way, I started the captures and cleared the logs right before
hitting ENTER and then stopped capture and ganesha right after the
test.
- I captured all traffic between leader1 (the ganesha server) and
n2521 (the aarch64 compute node).
- compute node IP: 172.23.0.16
- leader main ip: 172.23.0.3
- CTDB-managed IP alias (compute node mounts from there): 172.23.255.249
- I captured all traffic between leader one and the other 8 leaders
but excluded nfs and ctdb. I was concerned if I restricted just to nfs
and gluster I'd miss a port. I am happy to re-run the test
differently.
- I attached an xz-compressed tarball.
- Content:
ganesha.conf (With the suggested way to disable ACL)
ganesha-gfapi.log
ganesha.log
gluster-brick-log-readme.txt (a sample and a remark that all were
about the same)
leaders-no-ctdb-no-nfs.pcap (capture among compute nodes)
n2521.pcap (capture between leader/nfs server and compute node)
node-output.txt (output from the failing "su" command on the problem node)
tcpdump-cmd-lines-readme.txt (How I did the capture, how to read)
- I game to try anything that can help resolve this. I really appreciate
your time so far. I wish I could be more helpful on my end.
Erik