On Fri, Jul 16, 2021 at 9:25 PM <lars@redhat.com> wrote:
I've been experimenting with an HA NFS configuration using pacemaker and nfs-ganesha. I've noticed that after a failover event, it takes about five minutes for clients to recover, and that seems to be independent of the settings of Lease_lifetime and Grace_period. Client recovery also doesn't seem to correspond to the ":NFS Server Now NOT IN GRACE " message in the ganesha.log. Is this normal behavior?

It's not normal.

IIRC, servers notify the clients that they are in NFS_GRACE. On top of that any NFSv4 client that attempts I/O while the servers are in grace will receive NFS4ERR_GRACE, so if a client somehow missed the initial notification they would discover it when attempting I/O.

I see you're using pacemaker with CephFS (FSAL_CEPH).  It's my understanding that Ceph's HA solution for ganesha is built on top of kubernetes, not with pacemaker.

I don't have any experience with ganesha in this situation (or with k8s.) Asking Jeff Layton is probably your best option.


My pacemaker configuration looks like:

    Full List of Resources:
      * Resource Group: nfs:
        * nfsd      (systemd:nfs-ganesha):   Started nfs2.storage
        * nfs_vip   (ocf::heartbeat:IPaddr2):        Started nfs2.storage 

I guess this is meant to be an active/passive setup?

 
And the ganesha configuration looks like:

    NFS_CORE_PARAM
    {
            Enable_NLM = false;
            Enable_RQUOTA = false;
            Protocols = 4;
    }

    NFSv4
    {
            RecoveryBackend = rados_ng;
            Minor_Versions =  1,2;

            # From https://www.suse.com/support/kb/doc/?id=000019374
            Lease_Lifetime = 10;
            Grace_Period = 20;
    }

    MDCACHE {
            # Size the dirent cache down as small as possible.
            Dir_Chunk = 0;
    }

    EXPORT
    {
            Export_ID=100;
            Protocols = 4;
            Transports = TCP;
            Path = /;
            Pseudo = /data;
            Access_Type = RW;
            Attr_Expiration_Time = 0;
            Squash = none;

            FSAL {
                    Name = CEPH;
                    Filesystem = "tank";
                    User_Id = "nfs";
            }
    }

    RADOS_KV
    {
            UserId = "nfsmeta";
            pool = "cephfs.tank.meta";
            namespace = "ganesha";
    }