We also found this issue in our in-house testing. The exact patch for this should be the following:

commit c55046feb786d69de8ba046e7cbd242479621b66
Author: Daniel Gryniewicz <dang@redhat.com>
Date:   Fri Oct 6 09:14:21 2017 -0400

    MDCACHE - Release unused new entries

On Wed, Sep 19, 2018 at 6:57 PM, Sandeep Nashikkar <snashikkar@commvault.com> wrote:
Thanks Daniel. Will check this with more recent version of nfs-ganesha.

Thanks,
Sandeep

-----Original Message-----
From: Daniel Gryniewicz [mailto:dang@redhat.com]
Sent: 19 September 2018 18:22
To: Sandeep Nashikkar <snashikkar@commvault.com>
Cc: devel@lists.nfs-ganesha.org
Subject: Re: [NFS-Ganesha-Devel] Crash in mdcache_lru_cleanup_push()

We've fixed quite a few unexport/export races since 2.5.1.  A few that jump out at me are:

569039055fe209aadda7eabf5a5e230ae8938d25  - MDCACHE - Close more export/unexport races
d287a4eb404166c6f8eb6a468304504b930bee43 - MDCACHE - Close an unexport race
ceb4aed76dd3bb39f857853043800c556e475cf8 - Fixup unexport/lru_run_lane/mdcache_lru_clean races dc83243feed5c78267b682cf47addd9d83a41adb - Fix race between mdcache_unexport and mdc_check_mapping 36bc72781e395bfd6ba66a27ec22a9a9c66f366b - add cih_remove_checked() in
mdc_clean_entry()

Those are a quick scan of commit logs for the last year, there may be more.  Export/unexport in a thigh loop had quite a few problems, but we believe it works (at least for MDCACHE; there may be FSAL level bugs in some FSALS) in 2.7.0

Daniel


On Wed, Sep 19, 2018 at 4:31 AM, Sandeep Nashikkar <snashikkar@commvault.com> wrote:
> Ganesha Version: 2.5.1
>
> Platform: Linux x86_64
>
>
>
> I am seeing following issue while doing re-exporting the ganesha
> export (remove_export -> add_export)
>
> The crash happens during add_export operation and there are some
> applications using the export while this is happening. We also
> implemented mechanism to stall the IOs when this operation is
> happening but it did not help.
>
>
>
> Program terminated with signal 11, Segmentation fault.
>
> #0  0x0000000000530d27 in mdcache_lru_cleanup_push
> (entry=0x7f3078007f50)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_lru.c:935
>
> 935                     LRU_DQ_SAFE(lru, q);
>
> Missing separate debuginfos, use: debuginfo-install
> bzip2-libs-1.0.6-12.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
> elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64
> gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
> krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64
> libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-21.el7.x86_64
> libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
> libgcrypt-1.5.3-12.el7.x86_64 libgpg-error-1.12-3.el7.x86_64
> libnfsidmap-0.25-11.el7.x86_64 libselinux-2.5-12.el7.x86_64
> libuuid-2.23.2-21.el7.x86_64 lz4-1.7.5-2.el7.x86_64
> pcre-8.32-17.el7.x86_64
> systemd-libs-219-57.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64
> zlib-1.2.7-13.el7.x86_64
>
> (gdb) bt
>
> #0  0x0000000000530d27 in mdcache_lru_cleanup_push
> (entry=0x7f3078007f50)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_lru.c:935
>
> #1  0x000000000054a0fc in _mdcache_kill_entry (entry=0x7f3078007f50,
>
>     file=0x5a0970
> "/root/rpmbuild/BUILD/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FS
> ALs/FSAL_MDCACHE/mdcache_helpers.c",
> line=218,
>
>     function=0x5a2120 <__func__.23400> "mdcache_alloc_handle")
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_helpers.c:3310
>
> #2  0x00000000005419de in mdcache_alloc_handle (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, fs=0x0)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_helpers.c:218
>
> #3  0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> new_directory=false,
>
>     entry=0x7f31287c67e8, state=0x0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_helpers.c:590
>
> #4  0x0000000000544093 in mdcache_locate_host (fh_desc=0x7f31287c6c60,
> export=0x7f30bc05d070, entry=0x7f31287c67e8, attrs_out=0x0)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_helpers.c:998
>
> #5  0x000000000053d1b3 in mdcache_create_handle
> (exp_hdl=0x7f30bc05d070, fh_desc=0x7f31287c6c60,
> handle=0x7f31287c6c58, attrs_out=0x0)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_handle.c:1902
>
> #6  0x000000000047729d in nfs4_mds_putfh (data=0x7f31287c6d60) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
> tfh.c:211
>
> #7  0x0000000000477486 in nfs4_op_putfh (op=0x7f30a00424c0,
> data=0x7f31287c6d60, resp=0x7f3078000d30)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
> tfh.c:281
>
> #8  0x000000000045f670 in nfs4_Compound (arg=0x7f30a00010f0,
> req=0x7f30a00008e8, res=0x7f3078001fa0)
>
>     at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_Compo
> und.c:743
>
> #9  0x000000000044c20d in nfs_rpc_execute (reqdata=0x7f30a00008c0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
> ad.c:1289
>
> #10 0x000000000044ca17 in worker_run (ctx=0x1d88bf0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
> ad.c:1561
>
> #11 0x0000000000508a7a in fridgethr_start_routine (arg=0x1d88bf0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/support/fridgethr.c:550
>
> #12 0x00007f31797f2df5 in start_thread (arg=0x7f31287c8700) at
> pthread_create.c:308
>
> #13 0x00007f3178eb31ad in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>
>
>
> (gdb) p q
>
> $1 = (struct lru_q *) 0x0
>
> (gdb) f 3
>
> #3  0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> new_directory=false,
>
>     entry=0x7f31287c67e8, state=0x0) at
> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> L_MDCACHE/mdcache_helpers.c:590
>
> 590             nentry = mdcache_alloc_handle(export, sub_handle,
> sub_handle->fs);
>
> (gdb) p export->flags
>
> $2 = 1 '\001'
>
> (gdb) p entry->lru
>
> $6 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 1,
> flags = 0, lane = 2, cf = 0}
>
>
>
> mdc_check_mapping () in following snippet returns error because
> MDC_UNEXPORT flag in export->flags is set.
>
> After an export is completely removed and ganesha_mgr remove_export
> command returned successfully, why do we see the flag set?
>
> The comment also says “The current export is in process to be unexported”
>
>
>
>         /* Map the export before we put this entry into the LRU, but
> after it's
>
>          * well enough set up to be able to be unrefed by unexport
> should there
>
>          * be a race.
>
>          */
>
>         status = mdc_check_mapping(result);
>
>
>
>         if (unlikely(FSAL_IS_ERROR(status))) {
>
>                 /* The current export is in process to be unexported,
> don't
>
>                  * create new mdcache entries.
>
>                  */
>
>                 LogDebug(COMPONENT_CACHE_INODE,
>
>                          "Trying to allocate a new entry %p for export id %"
>
>                          PRIi16" that is in the process of being
> unexported",
>
>                          result, op_ctx->ctx_export->export_id);
>
>                 mdcache_put(result);
>
>                 mdcache_kill_entry(result);
>
>                 return NULL;
>
>         }
>
>
>
> The crash occurs because lru_queue_of() returns q = NULL due to
> entry->lru.qid = LRU_NO_LANE and LRU_DQ_SAFE refers to q.
>
>
>
> Is there any deferred work during unexport in mdcache FSAL?
>
> If we add delay between remove_export and add_export, I did not see
> the problem. But that does not seem to be elegant solution.
>
>
>
> Please help me understand if there are any limitations from mdcache
> with respect to back to back unexport and export operation.
>
>
>
> Thanks,
>
> Sandeep
>
>
>
>
>
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material
> for the sole use of the intended recipient. Any unauthorized review,
> use or distribution by others is strictly prohibited. If you have
> received the message by mistake, please advise the sender by reply
> email and delete the message. Thank you."
> **********************************************************************
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org To unsubscribe send
> an email to devel-leave@lists.nfs-ganesha.org
>
***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org