Hi,
With full debug logs for RW LOCK, I could see that the lock gets destroyed as part of
free_client_id.
2021-04-22T13:09:39Z : epoch 6080e5a2 : w1hs3r0302.vsanstfsad.local :
ganesha.nfsd-180[::ffff:172.30.100.232] [svc_440] 349 :free_client_id :RW LOCK :Destroy
mutex 0x7fa390179148 (&clientid->cid_owner.so_mutex) at
/build/mts/release/bora-17422501/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_clientid.c:349
……
……
2021-04-22T13:19:44Z : epoch 6080e5a2 : w1hs3r0302.vsanstfsad.local :
ganesha.nfsd-180[none] [dbus_heartbeat] 397 :_state_del_locked :RW LOCK :Error 22,
acquiring mutex 0x7fa390179148 (&owner->so_mutex) at
/build/mts/release/bora-17422501/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397
I think the culprit is “free_client_id” function. In this function the
“cid_owner->so_mutex” is destroyed without worrying about the refcount
PTHREAD_MUTEX_destroy(&clientid->cid_mutex);
PTHREAD_MUTEX_destroy(&clientid->cid_owner.so_mutex);
if (clientid->cid_minorversion == 0)
PTHREAD_MUTEX_destroy(&clientid->cid_cb.v40.cb_chan.mtx);
put_gsh_client(clientid->gsh_client);
I see that there have been many changes in export CRUD workflows so it would be difficult
to simply backport some of these changes to 2.8.4.
Thanks,
Sriram
From: Sriram Patil <sriramp(a)vmware.com>
Date: Tuesday, April 20, 2021 at 12:53 PM
To: Matthew DeVore <matvore(a)google.com>
Cc: Solomon Boulos <boulos(a)google.com>, Frank Filz <ffilzlnx(a)mindspring.com>,
dang(a)redhat.com <dang(a)redhat.com>, devel(a)lists.nfs-ganesha.org
<devel(a)lists.nfs-ganesha.org>
Subject: [NFS-Ganesha-Devel] Re: Failed to grab state owner mutex in _state_del_locked
Thanks Matthew. I checked the fixes. Unfortunately, are not related to state owner mutex.
I am re-running the tests on our side with full debug logging to see if I can debug the
issue further.
Thanks,
Sriram
From: Matthew DeVore <matvore(a)google.com>
Date: Tuesday, April 20, 2021 at 12:39 PM
To: Sriram Patil <sriramp(a)vmware.com>
Cc: Solomon Boulos <boulos(a)google.com>, Frank Filz <ffilzlnx(a)mindspring.com>,
dang(a)redhat.com <dang(a)redhat.com>, devel(a)lists.nfs-ganesha.org
<devel(a)lists.nfs-ganesha.org>
Subject: Re: [NFS-Ganesha-Devel] Re: Failed to grab state owner mutex in
_state_del_locked
2021年4月20日(火) 11:42 Sriram Patil
<sriramp@vmware.com<mailto:sriramp@vmware.com>>:
Hi Solomon,
Thanks for CC’ing Matthew. I saw a couple of review requests from him but they were
abandoned. So, not sure which ones are valid patches.
You can see the ones that were accepted with `git log --author=matvore@`
My last pthread-related fix was 28e457 - I guess you need Ganesha v4 for that - `git tag
--contains 28e457` shows for me:
V4-dev.41
V4-dev.42
V4-dev.43
V4-dev.44
Thanks,
Sriram
From: Solomon Boulos <boulos@google.com<mailto:boulos@google.com>>
Date: Tuesday, April 20, 2021 at 11:39 AM
To: Frank Filz <ffilzlnx@mindspring.com<mailto:ffilzlnx@mindspring.com>>
Cc: Matthew DeVore <matvore@google.com<mailto:matvore@google.com>>, Sriram
Patil <sriramp@vmware.com<mailto:sriramp@vmware.com>>,
dang@redhat.com<mailto:dang@redhat.com>
<dang@redhat.com<mailto:dang@redhat.com>>,
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
<devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>>
Subject: Re: [NFS-Ganesha-Devel] Re: Failed to grab state owner mutex in
_state_del_locked
Weren’t there a bunch of fixes from devore (cc’ed) where the union of mutex and something
else (condition variable?) weren’t used consistently? That was the first thing that popped
in my mind for EINVAL.
On Tue, Apr 20, 2021 at 11:33 Frank Filz
<ffilzlnx@mindspring.com<mailto:ffilzlnx@mindspring.com>> wrote:
There are some possibly related fixes in V3.4 that aren’t in V2.8.4 but they don’t look
like they might be relevant.
I do know that delegations have not been well tested lately, so if you’re using
delegations, there might well be a lock problem.
Frank
From: Sriram Patil [mailto:sriramp@vmware.com<mailto:sriramp@vmware.com>]
Sent: Tuesday, April 20, 2021 10:33 AM
To: devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
Cc: Frank Filz <ffilzlnx@mindspring.com<mailto:ffilzlnx@mindspring.com>>;
dang@redhat.com<mailto:dang@redhat.com>
Subject: [NFS-Ganesha-Devel] Failed to grab state owner mutex in _state_del_locked
Hi,
Recently we have been observing a ganesha abort because it receives EINVAL when trying to
lock the state owner lock (owner->so_mutex).
2021-03-28T10:09:12Z : epoch 605fa7ae : w1hs3i1902.vsanstfsad.local :
ganesha.nfsd-90[none] [dbus_heartbeat] 397 :_state_del_locked :RW LOCK :Error 22,
acquiring mutex 0x7f2fc4007958 (&owner->so_mutex) at
/build/mts/release/bora-17422501/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397
I modified some macros and printed RW LOCK activities whenever mutex name is
“&owner->so_mutex”. In this, I observed that the lock is never destroyed. So, this
EINVAL error is confusing. The EINVAL is observed when removing the export. The previous
log for the lock is in DELEG RETURN.
Could the mutex have been destroyed in the free_client_id function (nfs4_clientid.c)? In
that case, it won't use that string to destroy it, but instead use
`&clientid->cid_owner.so_mutex`
FWIW I found it effective to log all the destroys/creates (-N NIV_FULL_DEBUG) and then
grep the log for the hex address (0x7f2fc4007958 in this case) to know all activity on
it.
2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 775 :process_one_op :NFS4 :Request 3:
opcode 8 is OP_DELEGRETURN
2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 76 :nfs4_op_delegreturn :NFS4 LOCK
:Entering NFS v4 DELEGRETURN handler
-----------------------------------------------------
2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[none] [svc_181] 1377 :free_nfs_request :DISP :SVC_DECODE on 0x7f18d800be70
fd 90
(::ffff:172.30.72.54:720<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.30.72.54%3A720%2F&data=04%7C01%7Csriramp%40vmware.com%7Cc6ce57edba2441facea808d90435e4c5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637545451812656098%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MdYiZS4xANXHbytzNK8Xjjn2pRgqb0WZ0b6V%2FqTWJZs%3D&reserved=0>)
xid=2141597608 returned XPRT_IDLE
2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 129 :nfs4_op_delegreturn :NFS4 LOCK
:Successful exit
2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 397 :_state_del_locked :RW LOCK :Acquired
mutex 0x7f18b4017058 (&owner->so_mutex) at
/build/mts/release/sb-45847366/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397
……
…..
2021-04-20T04:57:00Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local :
ganesha.nfsd-90[none] [dbus_heartbeat] 397 :_state_del_locked :RW LOCK :Error 22,
acquiring mutex 0x7f18b4017058 (&owner->so_mutex) at
/build/mts/release/sb-45847366/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397
I am not very familiar with the NFSv4 state owner code. But does this look like some known
issue?
Note: We are using ganesha 2.8.4
Thanks,
Sriram
_______________________________________________
Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>