On Fri, Jul 16, 2021 at 9:25 PM <lars(a)redhat.com> wrote:
I've been experimenting with an HA NFS configuration using
pacemaker and
nfs-ganesha. I've noticed that after a failover event, it takes about five
minutes for clients to recover, and that seems to be independent of the
settings of Lease_lifetime and Grace_period. Client recovery also doesn't
seem to correspond to the ":NFS Server Now NOT IN GRACE " message in the
ganesha.log. Is this normal behavior?
It's not normal.
IIRC, servers notify the clients that they are in NFS_GRACE. On top of that
any NFSv4 client that attempts I/O while the servers are in grace will
receive NFS4ERR_GRACE, so if a client somehow missed the initial
notification they would discover it when attempting I/O.
I see you're using pacemaker with CephFS (FSAL_CEPH). It's my
understanding that Ceph's HA solution for ganesha is built on top of
kubernetes, not with pacemaker.
I don't have any experience with ganesha in this situation (or with k8s.)
Asking Jeff Layton is probably your best option.
My pacemaker configuration looks like:
Full List of Resources:
* Resource Group: nfs:
* nfsd (systemd:nfs-ganesha): Started nfs2.storage
* nfs_vip (ocf::heartbeat:IPaddr2): Started nfs2.storage
I guess this is meant to be an active/passive setup?
And the ganesha configuration looks like:
NFS_CORE_PARAM
{
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
}
NFSv4
{
RecoveryBackend = rados_ng;
Minor_Versions = 1,2;
# From
https://www.suse.com/support/kb/doc/?id=000019374
Lease_Lifetime = 10;
Grace_Period = 20;
}
MDCACHE {
# Size the dirent cache down as small as possible.
Dir_Chunk = 0;
}
EXPORT
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /data;
Access_Type = RW;
Attr_Expiration_Time = 0;
Squash = none;
FSAL {
Name = CEPH;
Filesystem = "tank";
User_Id = "nfs";
}
}
RADOS_KV
{
UserId = "nfsmeta";
pool = "cephfs.tank.meta";
namespace = "ganesha";
}