---------- Forwarded message ----------
From: Sachin Punadikar <punadikar.sachin@gmail.com>
Date: Tue, Jun 26, 2018 at 3:57 PM
Subject: Ganesha 2.5, crash /segfault while executing nlm4_Unlock
To: nfs-ganesha-devel <nfs-ganesha-devel@lists.sourceforge.net>


Hi All,
Recently a crash was reported by customer for Ganesha 2.5.
(gdb) where
#0  0x00007f475872900b in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x000000000041eac9 in fsal_obj_handle_fini (obj=0x7f4378028028) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/commonlib.c:192
#2  0x000000000053180f in mdcache_lru_clean (entry=0x7f4378027ff0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:589
#3  0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0, flags=0, func=0x5a9380 <__func__.23209> "cih_remove_checked", line=406)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#4  0x0000000000543e91 in cih_remove_checked (entry=0x7f4378027ff0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:406
#5  0x0000000000544b26 in mdc_clean_entry (entry=0x7f4378027ff0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:235
#6  0x000000000053181e in mdcache_lru_clean (entry=0x7f4378027ff0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592
#7  0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0, flags=0, func=0x5a70af <__func__.23112> "mdcache_put", line=190)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#8  0x0000000000539666 in mdcache_put (entry=0x7f4378027ff0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190
#9  0x000000000053f062 in mdcache_put_ref (obj_hdl=0x7f4378028028) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1709
#10 0x000000000049bf0f in nlm4_Unlock (args=0x7f4294165830, req=0x7f4294165028, res=0x7f43f001e0e0)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/Protocols/NLM/nlm_Unlock.c:128
#11 0x000000000044c719 in nfs_rpc_execute (reqdata=0x7f4294165000) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1290
#12 0x000000000044cf23 in worker_run (ctx=0x3c200e0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1562
#13 0x000000000050a3e7 in fridgethr_start_routine (arg=0x3c200e0) at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/support/fridgethr.c:550
#14 0x00007f4758725dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f4757de673d in clone () from /lib64/libc.so.6 

A closer look at the backtrace indicates that there was cyclic flow of execution as below:
nlm4_Unlock -> mdcache_put_ref -> mdcache_put -> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini and then mdc_clean_entry -> cih_remove_checked ->   (purposely coping next flow on below line)
                                                                                   -> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini  (currently crashing here) 

Do we see any code issue here ? Any hints on how to RCA this issue ?
Thanks in advance.    

--
with regards,
Sachin Punadikar



--
with regards,
Sachin Punadikar