Hello All,
Observed a deadlock situation while starting of Ganesha. Below are the
findings.
deadlock between thread 1 & thread 3 as below:
(gdb) where
#0 0x00003fff80c95408 in raise () from /lib64/libpthread.so.0
#1 0x0000000010071b70 in crash_handler (signo=11, info=0x3fffc499d7d8,
ctx=0x3fffc499ca60) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:225
#2 <signal handler called>
#3 0x00003fff80c92c18 in __lll_lock_wait () from /lib64/libpthread.so.0
#4 0x00003fff80c8b69c in pthread_mutex_lock () from /lib64/libpthread.so.0
#5 0x00000000101a6140 in lru_insert_entry (entry=0x1000886a460,
q=0x10266c20 <LRU>, edge=LRU_LRU)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:420
#6 0x00000000101ac024 in mdcache_lru_insert (entry=0x1000886a460) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1779
#7 0x00000000101c23f8 in mdcache_new_entry (export=0x10007c6e960,
sub_handle=0x1000886a130, attrs_in=0x3fffc499e590, attrs_out=0x0,
new_directory=false, entry=0x3fffc499e670, state=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:730
#8 0x00000000101b8d9c in mdcache_lookup_path (exp_hdl=0x10007c6e960,
path=0x10007c6e840 "/gpfs/gpfs0/nfs/nfs720", handle=0x3fffc499e750,
attrs_out=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1899
#9 0x0000000010165c00 in init_export_root (export=0x10007c6e698) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2263
#10 0x00000000101653bc in init_export_cb (exp=0x10007c6e698,
state=0x3fffc499ea30) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2127
#11 0x0000000010181510 in foreach_gsh_export (cb=0x10165388
<init_export_cb>, wrlock=true, state=0x3fffc499ea30)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:750
#12 0x000000001016544c in exports_pkginit () at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/exports.c:2146
#13 0x0000000010073460 in nfs_Init (p_start_info=0x1025a930
<my_nfs_start_info>) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:629
#14 0x000000001007463c in nfs_start (p_start_info=0x1025a930
<my_nfs_start_info>) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_init.c:922
#15 0x000000001001e0fc in main (argc=10, argv=0x3fffc499f658) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/MainNFSD/nfs_main.c:495
(gdb) frame 5
#5 0x00000000101a6140 in lru_insert_entry (entry=0x1000886a460,
q=0x10266c20 <LRU>, edge=LRU_LRU)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:420
420 QLOCK(qlane);
(gdb) p *qlane
$1 = {L1 = {q = {next = 0x100085e6258, prev = 0x100085e6258}, id =
LRU_ENTRY_L1, size = 1}, L2 = {q = {next = 0x10266c40 <LRU+32>, prev =
0x10266c40 <LRU+32>}, id = LRU_ENTRY_L2,
size = 0}, cleanup = {q = {next = 0x10266c60 <LRU+64>, prev =
0x10266c60 <LRU+64>}, id = LRU_ENTRY_CLEANUP, size = 0}, mtx = {__data =
{__lock = 2, __count = 0, __owner = 32651,
__nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next
= 0x0}}, __size = "\002\000\000\000\000\000\000\000\213\177\000\000\001",
'\000' <repeats 26 times>,
__align = 2}, iter = {active = true, glist = 0x100085e6258, glistn =
0x10266c20 <LRU>}, __pad0 = '\000' <repeats 127 times>}
(gdb) p (qlane)->mtx
$3 = {__data = {__lock = 2, __count = 0, __owner = 32651, __nusers = 1,
__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\213\177\000\000\001",
'\000'
<repeats 26 times>, __align = 2}
The lock is owned by LWP:32651 (i.e. thread 3) in function lru_run_lane
(frame 2)
(gdb) t 3
[Switching to thread 3 (Thread 0x3fff8075e830 (LWP 32651))]
#0 0x00003fff80c8d004 in pthread_rwlock_rdlock () from
/lib64/libpthread.so.0
(gdb) bt
#0 0x00003fff80c8d004 in pthread_rwlock_rdlock () from
/lib64/libpthread.so.0
#1 0x000000001017f7e0 in get_gsh_export (export_id=2744) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:351
#2 0x00000000101a90dc in lru_run_lane (lane=0, totalclosed=0x3fff8075dd58)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1088
#3 0x00000000101aa1c8 in lru_run (ctx=0x10006facc20) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1330
#4 0x000000001016c16c in fridgethr_start_routine (arg=0x10006facc20) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/fridgethr.c:550
#5 0x00003fff80c88af4 in start_thread () from /lib64/libpthread.so.0
#6 0x00003fff80ac4ef4 in clone () from /lib64/libc.so.6
(gdb) frame 1
#1 0x000000001017f7e0 in get_gsh_export (export_id=2744) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:351
351 PTHREAD_RWLOCK_rdlock(&export_by_id.lock);
(gdb) p export_by_id.lock
$4 = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0,
__writer_wakeup = 0, __nr_readers_queued = 1, __nr_writers_queued = 0,
__writer = 30981, __shared = 0, __pad1 = 0,
__pad2 = 0, __flags = 0}, __size = '\000' <repeats 16 times>,
"\001\000\000\000\000\000\000\000\005y", '\000' <repeats 29
times>, __align
= 0}
(gdb) p &export_by_id.lock
$9 = (pthread_rwlock_t *) 0x10260268 <export_by_id>
Note thread 3 is waiting for a lock which held by LWP:30981 (i.e. thread 1)
in function foreach_gsh_export (frame 11):
(gdb) t 1
[Switching to thread 1 (Thread 0x3fff8115bb50 (LWP 30981))]
#0 0x00003fff80c95408 in raise () from /lib64/libpthread.so.0
(gdb) frame 11
#11 0x0000000010181510 in foreach_gsh_export (cb=0x10165388
<init_export_cb>, wrlock=true, state=0x3fffc499ea30)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm028.00-0.1.1-Source/support/export_mgr.c:750
750 rc = cb(export, state);
(gdb) p export_by_id.lock
$5 = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0,
__writer_wakeup = 0, __nr_readers_queued = 1, __nr_writers_queued = 0,
__writer = 30981, __shared = 0, __pad1 = 0,
__pad2 = 0, __flags = 0}, __size = '\000' <repeats 16 times>,
"\001\000\000\000\000\000\000\000\005y", '\000' <repeats 29
times>, __align
= 0}
(gdb) p &export_by_id.lock
$7 = (pthread_rwlock_t *) 0x10260268 <export_by_id>
To address above issue I posted a patch
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/444414
--
with regards,
Sachin Punadikar