No, that put_ref is fine. It's a ref for an entire list, and so is
taken when the first entry is put on the list, and released when the
last entry is removed from the list. It should be safe.
Daniel
On Wed, Jun 27, 2018 at 8:58 AM, Sachin Punadikar
<punadikar.sachin@gmail.com> wrote:
> Thanks Daniel.
> The issue is not reproducible at will.
> When I checked the code, there is a chance of executing put_ref() in routine
> state_unlock().
> nlm4_Unlock() -> state_unlock()
> /* If the lock list has become zero; decrement the pin ref count pt
> * placed. Do this here just in case subtract_lock_from_list has
> made
> * list empty even if it failed.
> */
> if (glist_empty(&obj->state_hdl->file.lock_list))
> obj->obj_ops.put_ref(obj);
>
> Should we check whether we did called put_ref() in state_unlock() &
> accordingly skip calling put_ref() in nlm4_Unlock ?
>
> On Wed, Jun 27, 2018 at 10:33 AM, Daniel Gryniewicz <dang@redhat.com> wrote:
>>
>> So, it looks like some codepath has an extra put_ref() in it. The
>> handle in question had it's refcount go to zero, but still hand inavl
>> set. Since inavl is tied to the sentinal refcount, this shouldn't
>> happen.
>>
>> This isn't an error I remember seeing before, so it's likely to be in
>> next as well. Is there a reproducer for this case? MDCACHE has good
>> refcount debugging via LTTng, but only if I can reproduce it somehow.
>>
>> Daniel
>>
>> On Tue, Jun 26, 2018 at 6:33 AM, Sachin Punadikar
>> <punadikar.sachin@gmail.com> wrote:
>> >
>> > ---------- Forwarded message ----------
>> > From: Sachin Punadikar <punadikar.sachin@gmail.com>
>> > Date: Tue, Jun 26, 2018 at 3:57 PM
>> > Subject: Ganesha 2.5, crash /segfault while executing nlm4_Unlock
>> > To: nfs-ganesha-devel <nfs-ganesha-devel@lists.sourceforge.net >
>> >
>> >
>> > Hi All,
>> > Recently a crash was reported by customer for Ganesha 2.5.
>> > (gdb) where
>> > #0 0x00007f475872900b in pthread_rwlock_wrlock () from
>> > /lib64/libpthread.so.0
>> > #1 0x000000000041eac9 in fsal_obj_handle_fini (obj=0x7f4378028028) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/commonlib.c:192
>> > #2 0x000000000053180f in mdcache_lru_clean (entry=0x7f4378027ff0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_lru.c:589
>> > #3 0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
>> > flags=0,
>> > func=0x5a9380 <__func__.23209> "cih_remove_checked", line=406)
>> > at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_lru.c:1921
>> > #4 0x0000000000543e91 in cih_remove_checked (entry=0x7f4378027ff0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_hash.h:406
>> > #5 0x0000000000544b26 in mdc_clean_entry (entry=0x7f4378027ff0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_helpers.c:235
>> > #6 0x000000000053181e in mdcache_lru_clean (entry=0x7f4378027ff0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_lru.c:592
>> > #7 0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
>> > flags=0,
>> > func=0x5a70af <__func__.23112> "mdcache_put", line=190)
>> > at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_lru.c:1921
>> > #8 0x0000000000539666 in mdcache_put (entry=0x7f4378027ff0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_lru.h:190
>> > #9 0x000000000053f062 in mdcache_put_ref (obj_hdl=0x7f4378028028) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ FSAL/Stackable_FSALs/FSAL_ MDCACHE/mdcache_handle.c:1709
>> > #10 0x000000000049bf0f in nlm4_Unlock (args=0x7f4294165830,
>> > req=0x7f4294165028, res=0x7f43f001e0e0)
>> > at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ Protocols/NLM/nlm_Unlock.c:128
>> > #11 0x000000000044c719 in nfs_rpc_execute (reqdata=0x7f4294165000) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ MainNFSD/nfs_worker_thread.c: 1290
>> > #12 0x000000000044cf23 in worker_run (ctx=0x3c200e0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ MainNFSD/nfs_worker_thread.c: 1562
>> > #13 0x000000000050a3e7 in fridgethr_start_routine (arg=0x3c200e0) at
>> >
>> > /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/ support/fridgethr.c:550
>> > #14 0x00007f4758725dc5 in start_thread () from /lib64/libpthread.so.0
>> > #15 0x00007f4757de673d in clone () from /lib64/libc.so.6
>> >
>> > A closer look at the backtrace indicates that there was cyclic flow of
>> > execution as below:
>> > nlm4_Unlock -> mdcache_put_ref -> mdcache_put -> _mdcache_lru_unref ->
>> > mdcache_lru_clean -> fsal_obj_handle_fini and then mdc_clean_entry ->
>> > cih_remove_checked -> (purposely coping next flow on below line)
>> >
>> > -> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini
>> > (currently crashing here)
>> >
>> > Do we see any code issue here ? Any hints on how to RCA this issue ?
>> > Thanks in advance.
>> >
>> > --
>> > with regards,
>> > Sachin Punadikar
>> >
>> >
>> >
>> > --
>> > with regards,
>> > Sachin Punadikar
>> >
>> > _______________________________________________
>> > Devel mailing list -- devel@lists.nfs-ganesha.org
>> > To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>> >
>
>
>
>
> --
> with regards,
> Sachin Punadikar