Hi Frank,

 

Let me try to run our tests by disabling delegations to confirm this. May be I can get enable RW LOCKS debugging to pin point exactly which op/thread is destroying the lock. This could also be some ref counting issue.

 

Thanks,

Sriram

 

From: Frank Filz <ffilzlnx@mindspring.com>
Date: Tuesday, April 20, 2021 at 11:32 AM
To: Sriram Patil <sriramp@vmware.com>, devel@lists.nfs-ganesha.org <devel@lists.nfs-ganesha.org>
Cc: dang@redhat.com <dang@redhat.com>
Subject: RE: [NFS-Ganesha-Devel] Failed to grab state owner mutex in _state_del_locked

There are some possibly related fixes in V3.4 that aren’t in V2.8.4 but they don’t look like they might be relevant.

 

I do know that delegations have not been well tested lately, so if you’re using delegations, there might well be a lock problem.

 

Frank

 

From: Sriram Patil [mailto:sriramp@vmware.com]
Sent: Tuesday, April 20, 2021 10:33 AM
To: devel@lists.nfs-ganesha.org
Cc: Frank Filz <ffilzlnx@mindspring.com>; dang@redhat.com
Subject: [NFS-Ganesha-Devel] Failed to grab state owner mutex in _state_del_locked

 

Hi,

 

Recently we have been observing a ganesha abort because it receives EINVAL when trying to lock the state owner lock (owner->so_mutex).

 

2021-03-28T10:09:12Z : epoch 605fa7ae : w1hs3i1902.vsanstfsad.local : ganesha.nfsd-90[none] [dbus_heartbeat] 397 :_state_del_locked :RW LOCK :Error 22, acquiring mutex 0x7f2fc4007958 (&owner->so_mutex) at /build/mts/release/bora-17422501/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397

 

I modified some macros and printed RW LOCK activities whenever mutex name is “&owner->so_mutex”. In this, I observed that the lock is never destroyed. So, this EINVAL error is confusing. The EINVAL is observed when removing the export. The previous log for the lock is in DELEG RETURN.

 

2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 775 :process_one_op :NFS4 :Request 3: opcode 8 is OP_DELEGRETURN                                                                                                                                                               

2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 76 :nfs4_op_delegreturn :NFS4 LOCK :Entering NFS v4 DELEGRETURN handler -----------------------------------------------------                                                                                                  

2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[none] [svc_181] 1377 :free_nfs_request :DISP :SVC_DECODE on 0x7f18d800be70 fd 90 (::ffff:172.30.72.54:720) xid=2141597608 returned XPRT_IDLE                                                                                                                  

2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 129 :nfs4_op_delegreturn :NFS4 LOCK :Successful exit 

2021-04-20T04:07:12Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[::ffff:172.30.72.54] [svc_165] 397 :_state_del_locked :RW LOCK :Acquired mutex 0x7f18b4017058 (&owner->so_mutex) at /build/mts/release/sb-45847366/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397

……

…..

2021-04-20T04:57:00Z : epoch 607e2d18 : w1hs3r0313.vsanstfsad.local : ganesha.nfsd-90[none] [dbus_heartbeat] 397 :_state_del_locked :RW LOCK :Error 22, acquiring mutex 0x7f18b4017058 (&owner->so_mutex) at /build/mts/release/sb-45847366/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_state.c:397

 

I am not very familiar with the NFSv4 state owner code. But does this look like some known issue?

 

Note: We are using ganesha 2.8.4

 

Thanks,

Sriram