On Fri, 2020-10-30 at 07:44 -0700, Frank Filz wrote:
> Where does nfs-ganesha save lock/delegation/reservation etc
> information in
> order to allow clients to reclaim upon restart? What is FSAL's role
> in the reclaim
> process?
Ganesha doesn't persist anything other than the fact that a particular
client has state. When the client detects server reboot (because it's
clientid is suddenly invalid), it starts to reclaim state. The server
will check that it was a client that was registered as having state in
the previous instance (this allows detection of a client that missed
reclaiming state when the server has not rebooted two or more times
since the client had the state).
In general the FSAL has no role in the reclaim process, however,
CephFS and FSAL_CEPH work together to preserve knowledge that the
server node that is rebooting had state, and CephFS prevents other
processes claiming conflicting state before the clients of the failed
server have a chance to reclaim the state (Jeff Layton can describe
that in more detail).
The "theory" is laid out in this slide deck (for the most part):
http://nfsv4bat.org/Documents/BakeAThon/2018/Active_Active%20NFS%20Server...
Otherwise, when the Ganesha process fails (whether the process itself
crashes, or it's running on a cluster node that fails), all state held
by clients is dropped and other processes are free to acquire
conflicting state without Ganesha necessarily being aware that happened.
Clustered Ganesha does at least attempt to enter grace period on all
nodes, however, this is not synchronous with the failure so can leave a
small window and of course there is no integration with any other
processes that might be sharing the files (thus some vendors do not
support multi-protocol, NFS and CIFS for example, sharing of the same
file sets).
To be clear, that window does not exist with the rados_cluster recovery
backend and cephfs. The other solutions seem to have such a window,
however.
--
Jeff Layton <jlayton(a)poochiereds.net>
--
Jeff Layton <jlayton(a)redhat.com>