Thanks Daniel. Will check this with more recent version of nfs-ganesha.
Thanks,
Sandeep
-----Original Message-----
From: Daniel Gryniewicz [mailto:dang@redhat.com]
Sent: 19 September 2018 18:22
To: Sandeep Nashikkar <snashikkar(a)commvault.com>
Cc: devel(a)lists.nfs-ganesha.org
Subject: Re: [NFS-Ganesha-Devel] Crash in mdcache_lru_cleanup_push()
We've fixed quite a few unexport/export races since 2.5.1. A few that jump out at me
are:
569039055fe209aadda7eabf5a5e230ae8938d25 - MDCACHE - Close more export/unexport races
d287a4eb404166c6f8eb6a468304504b930bee43 - MDCACHE - Close an unexport race
ceb4aed76dd3bb39f857853043800c556e475cf8 - Fixup unexport/lru_run_lane/mdcache_lru_clean
races dc83243feed5c78267b682cf47addd9d83a41adb - Fix race between mdcache_unexport and
mdc_check_mapping 36bc72781e395bfd6ba66a27ec22a9a9c66f366b - add cih_remove_checked() in
mdc_clean_entry()
Those are a quick scan of commit logs for the last year, there may be more.
Export/unexport in a thigh loop had quite a few problems, but we believe it works (at
least for MDCACHE; there may be FSAL level bugs in some FSALS) in 2.7.0
Daniel
On Wed, Sep 19, 2018 at 4:31 AM, Sandeep Nashikkar <snashikkar(a)commvault.com>
wrote:
Ganesha Version: 2.5.1
Platform: Linux x86_64
I am seeing following issue while doing re-exporting the ganesha
export (remove_export -> add_export)
The crash happens during add_export operation and there are some
applications using the export while this is happening. We also
implemented mechanism to stall the IOs when this operation is
happening but it did not help.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000530d27 in mdcache_lru_cleanup_push
(entry=0x7f3078007f50)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_lru.c:935
935 LRU_DQ_SAFE(lru, q);
Missing separate debuginfos, use: debuginfo-install
bzip2-libs-1.0.6-12.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64
gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64
libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-21.el7.x86_64
libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
libgcrypt-1.5.3-12.el7.x86_64 libgpg-error-1.12-3.el7.x86_64
libnfsidmap-0.25-11.el7.x86_64 libselinux-2.5-12.el7.x86_64
libuuid-2.23.2-21.el7.x86_64 lz4-1.7.5-2.el7.x86_64
pcre-8.32-17.el7.x86_64
systemd-libs-219-57.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64
zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0 0x0000000000530d27 in mdcache_lru_cleanup_push
(entry=0x7f3078007f50)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_lru.c:935
#1 0x000000000054a0fc in _mdcache_kill_entry (entry=0x7f3078007f50,
file=0x5a0970
"/root/rpmbuild/BUILD/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FS
ALs/FSAL_MDCACHE/mdcache_helpers.c",
line=218,
function=0x5a2120 <__func__.23400> "mdcache_alloc_handle")
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_helpers.c:3310
#2 0x00000000005419de in mdcache_alloc_handle (export=0x7f30bc05d070,
sub_handle=0x7f3078007c20, fs=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_helpers.c:218
#3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
new_directory=false,
entry=0x7f31287c67e8, state=0x0) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_helpers.c:590
#4 0x0000000000544093 in mdcache_locate_host (fh_desc=0x7f31287c6c60,
export=0x7f30bc05d070, entry=0x7f31287c67e8, attrs_out=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_helpers.c:998
#5 0x000000000053d1b3 in mdcache_create_handle
(exp_hdl=0x7f30bc05d070, fh_desc=0x7f31287c6c60,
handle=0x7f31287c6c58, attrs_out=0x0)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_handle.c:1902
#6 0x000000000047729d in nfs4_mds_putfh (data=0x7f31287c6d60) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
tfh.c:211
#7 0x0000000000477486 in nfs4_op_putfh (op=0x7f30a00424c0,
data=0x7f31287c6d60, resp=0x7f3078000d30)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
tfh.c:281
#8 0x000000000045f670 in nfs4_Compound (arg=0x7f30a00010f0,
req=0x7f30a00008e8, res=0x7f3078001fa0)
at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_Compo
und.c:743
#9 0x000000000044c20d in nfs_rpc_execute (reqdata=0x7f30a00008c0) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
ad.c:1289
#10 0x000000000044ca17 in worker_run (ctx=0x1d88bf0) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
ad.c:1561
#11 0x0000000000508a7a in fridgethr_start_routine (arg=0x1d88bf0) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/support/fridgethr.c:550
#12 0x00007f31797f2df5 in start_thread (arg=0x7f31287c8700) at
pthread_create.c:308
#13 0x00007f3178eb31ad in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) p q
$1 = (struct lru_q *) 0x0
(gdb) f 3
#3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
new_directory=false,
entry=0x7f31287c67e8, state=0x0) at
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
L_MDCACHE/mdcache_helpers.c:590
590 nentry = mdcache_alloc_handle(export, sub_handle,
sub_handle->fs);
(gdb) p export->flags
$2 = 1 '\001'
(gdb) p entry->lru
$6 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 1,
flags = 0, lane = 2, cf = 0}
mdc_check_mapping () in following snippet returns error because
MDC_UNEXPORT flag in export->flags is set.
After an export is completely removed and ganesha_mgr remove_export
command returned successfully, why do we see the flag set?
The comment also says “The current export is in process to be unexported”
/* Map the export before we put this entry into the LRU, but
after it's
* well enough set up to be able to be unrefed by unexport
should there
* be a race.
*/
status = mdc_check_mapping(result);
if (unlikely(FSAL_IS_ERROR(status))) {
/* The current export is in process to be unexported,
don't
* create new mdcache entries.
*/
LogDebug(COMPONENT_CACHE_INODE,
"Trying to allocate a new entry %p for export id %"
PRIi16" that is in the process of being
unexported",
result, op_ctx->ctx_export->export_id);
mdcache_put(result);
mdcache_kill_entry(result);
return NULL;
}
The crash occurs because lru_queue_of() returns q = NULL due to
entry->lru.qid = LRU_NO_LANE and LRU_DQ_SAFE refers to q.
Is there any deferred work during unexport in mdcache FSAL?
If we add delay between remove_export and add_export, I did not see
the problem. But that does not seem to be elegant solution.
Please help me understand if there are any limitations from mdcache
with respect to back to back unexport and export operation.
Thanks,
Sandeep
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material
for the sole use of the intended recipient. Any unauthorized review,
use or distribution by others is strictly prohibited. If you have
received the message by mistake, please advise the sender by reply
email and delete the message. Thank you."
**********************************************************************
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send
an email to devel-leave(a)lists.nfs-ganesha.org
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************