Where does nfs-ganesha save lock/delegation/reservation etc
information in
order to allow clients to reclaim upon restart? What is FSAL's role in the reclaim
process?
Ganesha doesn't persist anything other than the fact that a particular client has
state. When the client detects server reboot (because it's clientid is suddenly
invalid), it starts to reclaim state. The server will check that it was a client that was
registered as having state in the previous instance (this allows detection of a client
that missed reclaiming state when the server has not rebooted two or more times since the
client had the state).
In general the FSAL has no role in the reclaim process, however, CephFS and FSAL_CEPH work
together to preserve knowledge that the server node that is rebooting had state, and
CephFS prevents other processes claiming conflicting state before the clients of the
failed server have a chance to reclaim the state (Jeff Layton can describe that in more
detail). Otherwise, when the Ganesha process fails (whether the process itself crashes, or
it's running on a cluster node that fails), all state held by clients is dropped and
other processes are free to acquire conflicting state without Ganesha necessarily being
aware that happened. Clustered Ganesha does at least attempt to enter grace period on all
nodes, however, this is not synchronous with the failure so can leave a small window and
of course there is no integration with any other processes that might be sharing the files
(thus some vendors do not support multi-protocol, NFS and CIFS for example, sharing of the
same file sets).
Frank