On Mon, 2021-01-25 at 09:07 -0500, Daniel Gryniewicz wrote:
Several issues.
We've had problems with snapshot directories in the past; they don't
seem to follow normal semantics, so it may be that they don't work with
recovery.
Second, I'm not sure NFSv3 recovery was ever certified with RADOS
recovery and Ceph. My memory is that it was only NFSv4 recovery. I may
be wrong about this, though.
CCing Jeff, who wrote this code, as he would likely know.
Daniel
On 1/23/21 4:23 AM, rainer.stumbaum(a)gmail.com wrote:
> I changed the NFSv4 recovery backend from rados_ng to rados_cluster, as all five
nfs-ganesha processes will be up all the time.
>
> And I found my mistake: I thought the rados_kv would also use the default namespace
"ganesha-namespace" but instead uses NULL. So I was maintaining and dumping the
grace DB at an incorrect location.
>
> I played around with the nodeid parameter for some time as I expected my
configuration error to be there. It would be extremly helpfull to have rados_ng and
rados_cluster to say something in the startup log like
> nfs4_recovery_init :CLIENT ID :EVENT :rados_cluster init using
rados://nfs-ganesha/grace with nodeid <nodeid>
> That would have been very helpful and time saving!
>
That's a good idea -- happy to review a patch for this, or if you open a
bug I can take a look at that when I get a chance.
> So NFSv4 recovery works fine over the 5 nodes.
>
> NFSv3 "recovery" also works - EXCEPT (!!!):
> I am mounting a cephfs .snap directory on NFS-root booted systems like this:
> 10.20.56.2:/vol/diskless/.snap/00225/debian10-amd64-srv on /run/initramfs/rofs type
nfs
(ro,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,nocto,nolock,noacl,proto=tcp,port=2049,timeo=100,retrans=360,sec=sys,local_lock=all,addr=10.20.56.2)
>
> As soon as I failover with the 10.20.56.2 IP to another nfs-ganesha I get the stale
file handle.
>
Yeah, not too surprising aince you're poking around in snapshots.
libcephfs is a bit hobbled as an API for lookups of snapped inodes. See:
https://github.com/nfs-ganesha/nfs-ganesha/blob/4e0b839f74608ce7005e533ed...
If it's not already in cache, then we don't really have a mechanism to
look up a snapshotted inode at the moment. This could probably be fixed,
but it would need to be patched into libcephfs first and then you'd need
to fix ganesha to use the new API.
> If I mount the RW image directly like this
> 10.20.56.2:/vol/diskless/debian10-amd64-srv on /run/initramfs/rofs type nfs
(ro,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,nocto,nolock,noacl,proto=tcp,port=2049,timeo=100,retrans=360,sec=sys,local_lock=all,addr=10.20.56.2)
> the IPv4 takeover just works.
>
> Is it possible to mount a .snap Ceph snapshot directory and survice a NFSv3 IP
failover?
>
--
Jeff Layton <jlayton(a)redhat.com>