On Mon, 2020-09-14 at 19:32 +0800, liuwei wrote:
Hi,
I found many error info in the ganesha.log when stop nfs-ganesha.service, as follows.
ganesha.log:
:ganesha.nfsd-1550083[Admin] mdcache_lru_clean INODE:F_DBG:Trusting op_ctx export id 2
:ganesha.nfsd-1550083[Admin] posix2fsal_error:FSAL:CRIT:Default case mapping Transport
endpoint is not connected (107) to ERR_FSAL_SERVERFAULT
:ganesha.nfsd-1550083[Admin]
mdcache_Iru_clean:INODE LRU:CRIT:Error closing file in cleanup:Undefined server error
My version info: Ganesha-3.3+FSAL_CEPH(ceph version 14.2.10);
I went through the relevant code and still didn't understand the intention of the
code.
static void release_export(struct gsh_export *export)
{
export->fsal_export->exp_ops.prepare_unexport(export->fsal_export);
//mdcache_prepare_unexport
/* Release state belonging to this export */
state_release_export(export);
/* Flush FSAL-specific state */
export->fsal_export->exp_ops.unexport(export->fsal_export,obj);
//mdcache_unexport
}
The first thing to declare is that
USE_FSAL_CEPH_ABORT_CONN has been opened, so calling ceph_abort_conn() in
ceph_prepare_unexport will abort the connection and umount.
No, it just aborts the connection. The "mount" is still there, it's just
that attempting to send anything to the MDS after that point will result
in an error (like you're seeing on close). That error should be harmless
as the server is going down anyway and has stopped talking to clients at
that point.
However, the following operation will still need export->cmount
and the connect, such as fsal_close() in
mdcache_Iru _clean().
I think that is the reason for the error printing. To sum up, I have two questions:
1、Isn't it reasonable to disconnect in
prepare_unexport before mdcache_lru_clean?
2、What is the intention of doing so? What will happen if I turn off
USE_FSAL_CEPH_ABORT_CONN?
Turning off USE_FSAL_CEPH_ABORT_CONN will open up a race window that
would allow other ceph clients to acquire state that was still being
held by the server that is restarting.
FSAL_CEPH depends on the MDS preserving the state of ganesha's ceph
client if a server head has to be restarted. If we just tear down the
connection in this situation (as in a normal umount), the MDS would
release all of the state held by that ganesha, and other active/active
server heads could sneak in and steal the locks that it had previously
held.
Aborting the connection ensures that ganesha will have no further
communication with the MDS until the server comes back. To the MDS, it
looks like the ganesha client just dropped off the net. The MDS will
then keep the state held by that client until it comes back or times
out. Once the client does come back, it then ensures that the other
ganesha servers are enforcing the grace period, and then it will ask the
MDS to kill off the old session.
--
Jeff Layton <jlayton(a)redhat.com>