Hello All,

Observed a deadlock situation while starting of Ganesha. Below are the findings.
deadlock between thread 1 & thread 3 as below:

(gdb) where
#0  0x00003fff80c95408 in raise () from /lib64/libpthread.so.0
#1  0x0000000010071b70 in crash_handler (signo=11, info=0x3fffc499d7d8, ctx=0x3fffc499ca60) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:225
#2  <signal handler called>
#3  0x00003fff80c92c18 in __lll_lock_wait () from /lib64/libpthread.so.0
#4  0x00003fff80c8b69c in pthread_mutex_lock () from /lib64/libpthread.so.0
#5  0x00000000101a6140 in lru_insert_entry (entry=0x1000886a460, q=0x10266c20 <LRU>, edge=LRU_LRU)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:420
#6  0x00000000101ac024 in mdcache_lru_insert (entry=0x1000886a460) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1779
#7  0x00000000101c23f8 in mdcache_new_entry (export=0x10007c6e960, sub_handle=0x1000886a130, attrs_in=0x3fffc499e590, attrs_out=0x0, new_directory=false, entry=0x3fffc499e670, state=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:730
#8  0x00000000101b8d9c in mdcache_lookup_path (exp_hdl=0x10007c6e960, path=0x10007c6e840 "/gpfs/gpfs0/nfs/nfs720", handle=0x3fffc499e750, attrs_out=0x0)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1899
#9  0x0000000010165c00 in init_export_root (export=0x10007c6e698) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2263
#10 0x00000000101653bc in init_export_cb (exp=0x10007c6e698, state=0x3fffc499ea30) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2127
#11 0x0000000010181510 in foreach_gsh_export (cb=0x10165388 <init_export_cb>, wrlock=true, state=0x3fffc499ea30)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:750
#12 0x000000001016544c in exports_pkginit () at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2146
#13 0x0000000010073460 in nfs_Init (p_start_info=0x1025a930 <my_nfs_start_info>) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:629
#14 0x000000001007463c in nfs_start (p_start_info=0x1025a930 <my_nfs_start_info>) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:922
#15 0x000000001001e0fc in main (argc=10, argv=0x3fffc499f658) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_main.c:495

(gdb) frame 5
#5  0x00000000101a6140 in lru_insert_entry (entry=0x1000886a460, q=0x10266c20 <LRU>, edge=LRU_LRU)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:420
420        QLOCK(qlane);
(gdb) p *qlane
$1 = {L1 = {q = {next = 0x100085e6258, prev = 0x100085e6258}, id = LRU_ENTRY_L1, size = 1}, L2 = {q = {next = 0x10266c40 <LRU+32>, prev = 0x10266c40 <LRU+32>}, id = LRU_ENTRY_L2,
    size = 0}, cleanup = {q = {next = 0x10266c60 <LRU+64>, prev = 0x10266c60 <LRU+64>}, id = LRU_ENTRY_CLEANUP, size = 0}, mtx = {__data = {__lock = 2, __count = 0, __owner = 32651,
      __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\213\177\000\000\001", '\000' <repeats 26 times>,
    __align = 2}, iter = {active = true, glist = 0x100085e6258, glistn = 0x10266c20 <LRU>}, __pad0 = '\000' <repeats 127 times>}


(gdb) p (qlane)->mtx
$3 = {__data = {__lock = 2, __count = 0, __owner = 32651, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\000\000\000\000\213\177\000\000\001", '\000' <repeats 26 times>, __align = 2}

The lock is owned by LWP:32651 (i.e. thread 3) in function lru_run_lane (frame 2)
(gdb) t 3
[Switching to thread 3 (Thread 0x3fff8075e830 (LWP 32651))]
#0  0x00003fff80c8d004 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00003fff80c8d004 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
#1  0x000000001017f7e0 in get_gsh_export (export_id=2744) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:351
#2  0x00000000101a90dc in lru_run_lane (lane=0, totalclosed=0x3fff8075dd58) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1088
#3  0x00000000101aa1c8 in lru_run (ctx=0x10006facc20) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1330
#4  0x000000001016c16c in fridgethr_start_routine (arg=0x10006facc20) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/fridgethr.c:550
#5  0x00003fff80c88af4 in start_thread () from /lib64/libpthread.so.0
#6  0x00003fff80ac4ef4 in clone () from /lib64/libc.so.6

(gdb) frame 1
#1  0x000000001017f7e0 in get_gsh_export (export_id=2744) at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:351
351        PTHREAD_RWLOCK_rdlock(&export_by_id.lock);
(gdb) p export_by_id.lock
$4 = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 1, __nr_writers_queued = 0, __writer = 30981, __shared = 0, __pad1 = 0,
    __pad2 = 0, __flags = 0}, __size = '\000' <repeats 16 times>, "\001\000\000\000\000\000\000\000\005y", '\000' <repeats 29 times>, __align = 0}
(gdb) p &export_by_id.lock
$9 = (pthread_rwlock_t *) 0x10260268 <export_by_id>

Note thread 3 is waiting for a lock which held by LWP:30981 (i.e. thread 1) in function foreach_gsh_export (frame 11):
(gdb) t 1
[Switching to thread 1 (Thread 0x3fff8115bb50 (LWP 30981))]
#0  0x00003fff80c95408 in raise () from /lib64/libpthread.so.0

(gdb) frame 11
#11 0x0000000010181510 in foreach_gsh_export (cb=0x10165388 <init_export_cb>, wrlock=true, state=0x3fffc499ea30)
    at /usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:750
750            rc = cb(export, state);
(gdb) p export_by_id.lock
$5 = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 1, __nr_writers_queued = 0, __writer = 30981, __shared = 0, __pad1 = 0,
    __pad2 = 0, __flags = 0}, __size = '\000' <repeats 16 times>, "\001\000\000\000\000\000\000\000\005y", '\000' <repeats 29 times>, __align = 0}
(gdb) p &export_by_id.lock
$7 = (pthread_rwlock_t *) 0x10260268 <export_by_id>

To address above issue I posted a patch
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/444414


--
with regards,
Sachin Punadikar