We've fixed quite a few unexport/export races since 2.5.1. A few that
jump out at me are:
569039055fe209aadda7eabf5a5e230ae8938d25 - MDCACHE - Close more
export/unexport races
d287a4eb404166c6f8eb6a468304504b930bee43 - MDCACHE - Close an unexport race
ceb4aed76dd3bb39f857853043800c556e475cf8 - Fixup
unexport/lru_run_lane/mdcache_lru_clean races
dc83243feed5c78267b682cf47addd9d83a41adb - Fix race between
mdcache_unexport and mdc_check_mapping
36bc72781e395bfd6ba66a27ec22a9a9c66f366b - add cih_remove_checked() in
mdc_clean_entry()
Those are a quick scan of commit logs for the last year, there may be
more. Export/unexport in a thigh loop had quite a few problems, but
we believe it works (at least for MDCACHE; there may be FSAL level
bugs in some FSALS) in 2.7.0
Daniel
On Wed, Sep 19, 2018 at 4:31 AM, Sandeep Nashikkar
<snashikkar(a)commvault.com> wrote:
> Ganesha Version: 2.5.1
>
> Platform: Linux x86_64
>
>
>
> I am seeing following issue while doing re-exporting the ganesha export
> (remove_export -> add_export)
>
> The crash happens during add_export operation and there are some
> applications using the export while this is happening. We also implemented
> mechanism to stall the IOs when this operation is happening but it did not
> help.
>
>
>
> Program terminated with signal 11, Segmentation fault.
>
> #0 0x0000000000530d27 in mdcache_lru_cleanup_push (entry=0x7f3078007f50)
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:935
>
> 935 LRU_DQ_SAFE(lru, q);
>
> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-12.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
> elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64
> gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
> krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64
> libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-21.el7.x86_64
> libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
> libgcrypt-1.5.3-12.el7.x86_64 libgpg-error-1.12-3.el7.x86_64
> libnfsidmap-0.25-11.el7.x86_64 libselinux-2.5-12.el7.x86_64
> libuuid-2.23.2-21.el7.x86_64 lz4-1.7.5-2.el7.x86_64 pcre-8.32-17.el7.x86_64
> systemd-libs-219-57.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64
> zlib-1.2.7-13.el7.x86_64
>
> (gdb) bt
>
> #0 0x0000000000530d27 in mdcache_lru_cleanup_push (entry=0x7f3078007f50)
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:935
>
> #1 0x000000000054a0fc in _mdcache_kill_entry (entry=0x7f3078007f50,
>
> file=0x5a0970
>
"/root/rpmbuild/BUILD/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c",
> line=218,
>
> function=0x5a2120 <__func__.23400> "mdcache_alloc_handle")
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3310
>
> #2 0x00000000005419de in mdcache_alloc_handle (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, fs=0x0)
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:218
>
> #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> new_directory=false,
>
> entry=0x7f31287c67e8, state=0x0) at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:590
>
> #4 0x0000000000544093 in mdcache_locate_host (fh_desc=0x7f31287c6c60,
> export=0x7f30bc05d070, entry=0x7f31287c67e8, attrs_out=0x0)
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:998
>
> #5 0x000000000053d1b3 in mdcache_create_handle (exp_hdl=0x7f30bc05d070,
> fh_desc=0x7f31287c6c60, handle=0x7f31287c6c58, attrs_out=0x0)
>
> at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1902
>
> #6 0x000000000047729d in nfs4_mds_putfh (data=0x7f31287c6d60) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_putfh.c:211
>
> #7 0x0000000000477486 in nfs4_op_putfh (op=0x7f30a00424c0,
> data=0x7f31287c6d60, resp=0x7f3078000d30)
>
> at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_putfh.c:281
>
> #8 0x000000000045f670 in nfs4_Compound (arg=0x7f30a00010f0,
> req=0x7f30a00008e8, res=0x7f3078001fa0)
>
> at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:743
>
> #9 0x000000000044c20d in nfs_rpc_execute (reqdata=0x7f30a00008c0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1289
>
> #10 0x000000000044ca17 in worker_run (ctx=0x1d88bf0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1561
>
> #11 0x0000000000508a7a in fridgethr_start_routine (arg=0x1d88bf0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/support/fridgethr.c:550
>
> #12 0x00007f31797f2df5 in start_thread (arg=0x7f31287c8700) at
> pthread_create.c:308
>
> #13 0x00007f3178eb31ad in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>
>
>
> (gdb) p q
>
> $1 = (struct lru_q *) 0x0
>
> (gdb) f 3
>
> #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> new_directory=false,
>
> entry=0x7f31287c67e8, state=0x0) at
>
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:590
>
> 590 nentry = mdcache_alloc_handle(export, sub_handle,
> sub_handle->fs);
>
> (gdb) p export->flags
>
> $2 = 1 '\001'
>
> (gdb) p entry->lru
>
> $6 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 1, flags
> = 0, lane = 2, cf = 0}
>
>
>
> mdc_check_mapping () in following snippet returns error because MDC_UNEXPORT
> flag in export->flags is set.
>
> After an export is completely removed and ganesha_mgr remove_export command
> returned successfully, why do we see the flag set?
>
> The comment also says “The current export is in process to be unexported”
>
>
>
> /* Map the export before we put this entry into the LRU, but after
> it's
>
> * well enough set up to be able to be unrefed by unexport should
> there
>
> * be a race.
>
> */
>
> status = mdc_check_mapping(result);
>
>
>
> if (unlikely(FSAL_IS_ERROR(status))) {
>
> /* The current export is in process to be unexported, don't
>
> * create new mdcache entries.
>
> */
>
> LogDebug(COMPONENT_CACHE_INODE,
>
> "Trying to allocate a new entry %p for export id
%"
>
> PRIi16" that is in the process of being
> unexported",
>
> result, op_ctx->ctx_export->export_id);
>
> mdcache_put(result);
>
> mdcache_kill_entry(result);
>
> return NULL;
>
> }
>
>
>
> The crash occurs because lru_queue_of() returns q = NULL due to
> entry->lru.qid = LRU_NO_LANE and LRU_DQ_SAFE refers to q.
>
>
>
> Is there any deferred work during unexport in mdcache FSAL?
>
> If we add delay between remove_export and add_export, I did not see the
> problem. But that does not seem to be elegant solution.
>
>
>
> Please help me understand if there are any limitations from mdcache with
> respect to back to back unexport and export operation.
>
>
>
> Thanks,
>
> Sandeep
>
>
>
>
>
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>