[NFS-Ganesha-Support] Re: HA failover in less than five minutes?

Monday, 19 July 2021

On Fri, Jul 16, 2021 at 9:25 PM <lars(a)redhat.com&gt; wrote:

...
 I've been experimenting with an HA NFS configuration using
pacemaker and
 nfs-ganesha. I've noticed that after a failover event, it takes about five
 minutes for clients to recover, and that seems to be independent of the
 settings of Lease_lifetime and Grace_period. Client recovery also doesn't
 seem to correspond to the ":NFS Server Now NOT IN GRACE " message in the
 ganesha.log. Is this normal behavior?

It's not normal.

IIRC, servers notify the clients that they are in NFS_GRACE. On top of that
any NFSv4 client that attempts I/O while the servers are in grace will
receive NFS4ERR_GRACE, so if a client somehow missed the initial
notification they would discover it when attempting I/O.

I see you're using pacemaker with CephFS (FSAL_CEPH).  It's my
understanding that Ceph's HA solution for ganesha is built on top of
kubernetes, not with pacemaker.

I don't have any experience with ganesha in this situation (or with k8s.)
Asking Jeff Layton is probably your best option.

...
 My pacemaker configuration looks like:

     Full List of Resources:
       * Resource Group: nfs:
         * nfsd      (systemd:nfs-ganesha):   Started nfs2.storage
         * nfs_vip   (ocf::heartbeat:IPaddr2):        Started nfs2.storage 

I guess this is meant to be an active/passive setup?

...
 And the ganesha configuration looks like:

     NFS_CORE_PARAM
     {
             Enable_NLM = false;
             Enable_RQUOTA = false;
             Protocols = 4;
     }

     NFSv4
     {
             RecoveryBackend = rados_ng;
             Minor_Versions =  1,2;

             # From https://www.suse.com/support/kb/doc/?id=000019374
             Lease_Lifetime = 10;
             Grace_Period = 20;
     }

     MDCACHE {
             # Size the dirent cache down as small as possible.
             Dir_Chunk = 0;
     }

     EXPORT
     {
             Export_ID=100;
             Protocols = 4;
             Transports = TCP;
             Path = /;
             Pseudo = /data;
             Access_Type = RW;
             Attr_Expiration_Time = 0;
             Squash = none;

             FSAL {
                     Name = CEPH;
                     Filesystem = "tank";
                     User_Id = "nfs";
             }
     }

     RADOS_KV
     {
             UserId = "nfsmeta";
             pool = "cephfs.tank.meta";
             namespace = "ganesha";
     }

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Support] Re: HA failover in less than five minutes?