[NFS-Ganesha-Support] Re: NFSv3 active-passive cluster with FSAL CephFS and keepalived - Stale file handle after restart and also failover

Tuesday, 2 February 2021

On Sat, 2021-01-23 at 09:23 +0000, rainer.stumbaum(a)gmail.com wrote:
...
 I changed the NFSv4 recovery backend from rados_ng to rados_cluster,
as all five nfs-ganesha processes will be up all the time.

 And I found my mistake: I thought the rados_kv would also use the default namespace
"ganesha-namespace" but instead uses NULL. So I was maintaining and dumping the
grace DB at an incorrect location.

 I played around with the nodeid parameter for some time as I expected my configuration
error to be there. It would be extremly helpfull to have rados_ng and rados_cluster to say
something in the startup log like 
 nfs4_recovery_init :CLIENT ID :EVENT :rados_cluster init using rados://nfs-ganesha/grace
with nodeid <nodeid> 
 That would have been very helpful and time saving!

 So NFSv4 recovery works fine over the 5 nodes.

 NFSv3 "recovery" also works - EXCEPT (!!!):
 I am mounting a cephfs .snap directory on NFS-root booted systems like this:
 10.20.56.2:/vol/diskless/.snap/00225/debian10-amd64-srv on /run/initramfs/rofs type nfs
(ro,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,nocto,nolock,noacl,proto=tcp,port=2049,timeo=100,retrans=360,sec=sys,local_lock=all,addr=10.20.56.2)

 As soon as I failover with the 10.20.56.2 IP to another nfs-ganesha I get the stale file
handle.

 If I mount the RW image directly like this
 10.20.56.2:/vol/diskless/debian10-amd64-srv on /run/initramfs/rofs type nfs
(ro,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,nocto,nolock,noacl,proto=tcp,port=2049,timeo=100,retrans=360,sec=sys,local_lock=all,addr=10.20.56.2)
 the IPv4 takeover just works.

 Is it possible to mount a .snap Ceph snapshot directory and survice a NFSv3 IP failover?

Hi Rainer,

I started working on some patches for this. We have to extend libcephfs
first, and then use the new functionality from ganesha. I have some
draft patches in my trees, but what I don't currently have is a reliable
reproducer for this.

Do you a sequence of steps that always results in an -ESTALE in your
current setup? I'd like to make sure the patches I have fix it.

Thanks,
-- 
Jeff Layton <jlayton(a)poochiereds.net&gt;

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Support] Re: NFSv3 active-passive cluster with FSAL CephFS and keepalived - Stale file handle after restart and also failover