Thanks Daniel.
The issue is not reproducible at will.
When I checked the code, there is a chance of executing put_ref() in
routine state_unlock().
nlm4_Unlock() -> state_unlock()
        /* If the lock list has become zero; decrement the pin ref count pt
         * placed. Do this here just in case subtract_lock_from_list has
made
         * list empty even if it failed.
         */
        if (glist_empty(&obj->state_hdl->file.lock_list))
                obj->obj_ops.put_ref(obj);
Should we check whether we did called put_ref() in state_unlock() &
accordingly skip calling put_ref() in nlm4_Unlock ?
On Wed, Jun 27, 2018 at 10:33 AM, Daniel Gryniewicz <dang(a)redhat.com> wrote:
 So, it looks like some codepath has an extra put_ref() in it.  The
 handle in question had it's refcount go to zero, but still hand inavl
 set.  Since inavl is tied to the sentinal refcount, this shouldn't
 happen.
 This isn't an error I remember seeing before, so it's likely to be in
 next as well.  Is there a reproducer for this case?  MDCACHE has good
 refcount debugging via LTTng, but only if I can reproduce it somehow.
 Daniel
 On Tue, Jun 26, 2018 at 6:33 AM, Sachin Punadikar
 <punadikar.sachin(a)gmail.com> wrote:
 >
 > ---------- Forwarded message ----------
 > From: Sachin Punadikar <punadikar.sachin(a)gmail.com>
 > Date: Tue, Jun 26, 2018 at 3:57 PM
 > Subject: Ganesha 2.5, crash /segfault while executing nlm4_Unlock
 > To: nfs-ganesha-devel <nfs-ganesha-devel(a)lists.sourceforge.net>
 >
 >
 > Hi All,
 > Recently a crash was reported by customer for Ganesha 2.5.
 > (gdb) where
 > #0  0x00007f475872900b in pthread_rwlock_wrlock () from
 > /lib64/libpthread.so.0
 > #1  0x000000000041eac9 in fsal_obj_handle_fini (obj=0x7f4378028028) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/commonlib.c:192
 > #2  0x000000000053180f in mdcache_lru_clean (entry=0x7f4378027ff0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:589
 > #3  0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
 flags=0,
 > func=0x5a9380 <__func__.23209> "cih_remove_checked", line=406)
 >     at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
 > #4  0x0000000000543e91 in cih_remove_checked (entry=0x7f4378027ff0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:406
 > #5  0x0000000000544b26 in mdc_clean_entry (entry=0x7f4378027ff0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:235
 > #6  0x000000000053181e in mdcache_lru_clean (entry=0x7f4378027ff0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592
 > #7  0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
 flags=0,
 > func=0x5a70af <__func__.23112> "mdcache_put", line=190)
 >     at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
 > #8  0x0000000000539666 in mdcache_put (entry=0x7f4378027ff0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190
 > #9  0x000000000053f062 in mdcache_put_ref (obj_hdl=0x7f4378028028) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1709
 > #10 0x000000000049bf0f in nlm4_Unlock (args=0x7f4294165830,
 > req=0x7f4294165028, res=0x7f43f001e0e0)
 >     at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 Protocols/NLM/nlm_Unlock.c:128
 > #11 0x000000000044c719 in nfs_rpc_execute (reqdata=0x7f4294165000) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 MainNFSD/nfs_worker_thread.c:1290
 > #12 0x000000000044cf23 in worker_run (ctx=0x3c200e0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 MainNFSD/nfs_worker_thread.c:1562
 > #13 0x000000000050a3e7 in fridgethr_start_routine (arg=0x3c200e0) at
 > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/
 support/fridgethr.c:550
 > #14 0x00007f4758725dc5 in start_thread () from /lib64/libpthread.so.0
 > #15 0x00007f4757de673d in clone () from /lib64/libc.so.6
 >
 > A closer look at the backtrace indicates that there was cyclic flow of
 > execution as below:
 > nlm4_Unlock -> mdcache_put_ref -> mdcache_put -> _mdcache_lru_unref ->
 > mdcache_lru_clean -> fsal_obj_handle_fini and then mdc_clean_entry ->
 > cih_remove_checked ->   (purposely coping next flow on below line)
 >
 > -> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini
 > (currently crashing here)
 >
 > Do we see any code issue here ? Any hints on how to RCA this issue ?
 > Thanks in advance.
 >
 > --
 > with regards,
 > Sachin Punadikar
 >
 >
 >
 > --
 > with regards,
 > Sachin Punadikar
 >
 > _______________________________________________
 > Devel mailing list -- devel(a)lists.nfs-ganesha.org
 > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
 >
 
-- 
with regards,
Sachin Punadikar