On Thu, 2021-04-22 at 07:46 +0000, lokendrarathour(a)gmail.com wrote:
Hi,
We have setup NFS Ganesha as Active/Active and have mounted the cephfs using the VIP from
the VM. On top of it we are testing IP failover/and GANESHA SWITCH-OVER.
To manage this fail over in minimum time(for I/O to resume) we have found three variable
in the ceph system as :
1.session_timeout (min allowed value is 30) - default -60
2.session_autoclose(min allowed value is 30) - default -300
3.mds_cap_revoke_eviction_timeout
mds_cap_revoke_eviction_timeout is disabled by default and work at priority if
enabled or configured as something above 0.
we have tested by setting up mds_cap_revoke_eviction_timeout as 1 and have
achieved the I/O resume duration as 7-9 seconds, but we are unsure of 1 seconds and its
impact on production. because 1 seconds or few seconds can also be in class of network
latency or maybe something other than NFS Server fails.
Query:
we need to know certain recommended values for the same variables as marked above.
Note: we are not using this any container kind of setup to bring the down NFS server
immediately, we are just switching from one node to another as we detect failure on one of
the node.
This sort of setup (using a VIP front of two or more active/active
nodes) may not work the way you expect. In a rados_cluster ganesha
cluster, the ganesha servers each have their own client recovery
databases, and they aren't shared in any way.
You may be able to fail over all of the connections from one host to
another in that way, and I/O may continue in some fashion, but the new
server won't have any knowledge of the defunct server's clients. You
aren't entering a grace period when the VIP moves (I assume), so file
locking and lock reclaim won't function correctly.
The rados_cluster code was really designed to be a scale-out solution
where you distribute clients between the servers using something like
round-robin DNS or some sort of load balancer that uses stable hashing
to distribute the clients. The assumption here was always that if a
ganesha server head goes down, that we'll recreate it elsewhere (with
the same IP address) using containers or something similar.
Allowing the NFS clients to migrate to other ganesha servers, either via
VIP or some other means is not incorporated into the design. It could be
added in some way, but that is an entire project in and of itself.
It sounds more like you really want an active/passive solution of some
sort. It means you need to start up ganesha on failover, but you likely
want to enter a grace period anyway so that the clients can reclaim
their state.
--
Jeff Layton <jlayton(a)poochiereds.net>