We also found this issue in our in-house testing. The exact patch for this
should be the following:
commit c55046feb786d69de8ba046e7cbd242479621b66
Author: Daniel Gryniewicz <dang(a)redhat.com>
Date: Fri Oct 6 09:14:21 2017 -0400
MDCACHE - Release unused new entries
On Wed, Sep 19, 2018 at 6:57 PM, Sandeep Nashikkar <snashikkar(a)commvault.com
wrote:
> Thanks Daniel. Will check this with more recent version of nfs-ganesha.
>
> Thanks,
> Sandeep
>
> -----Original Message-----
> From: Daniel Gryniewicz [mailto:dang@redhat.com]
> Sent: 19 September 2018 18:22
> To: Sandeep Nashikkar <snashikkar(a)commvault.com>
> Cc: devel(a)lists.nfs-ganesha.org
> Subject: Re: [NFS-Ganesha-Devel] Crash in mdcache_lru_cleanup_push()
>
> We've fixed quite a few unexport/export races since 2.5.1. A few that
> jump out at me are:
>
> 569039055fe209aadda7eabf5a5e230ae8938d25 - MDCACHE - Close more
> export/unexport races
> d287a4eb404166c6f8eb6a468304504b930bee43 - MDCACHE - Close an unexport
> race
> ceb4aed76dd3bb39f857853043800c556e475cf8 - Fixup
> unexport/lru_run_lane/mdcache_lru_clean races
> dc83243feed5c78267b682cf47addd9d83a41adb - Fix race between
> mdcache_unexport and mdc_check_mapping 36bc72781e395bfd6ba66a27ec22a9a9c66f366b
> - add cih_remove_checked() in
> mdc_clean_entry()
>
> Those are a quick scan of commit logs for the last year, there may be
> more. Export/unexport in a thigh loop had quite a few problems, but we
> believe it works (at least for MDCACHE; there may be FSAL level bugs in
> some FSALS) in 2.7.0
>
> Daniel
>
>
> On Wed, Sep 19, 2018 at 4:31 AM, Sandeep Nashikkar <
> snashikkar(a)commvault.com
wrote:
> > Ganesha Version:
2.5.1
> >
> > Platform: Linux x86_64
> >
> >
> >
> > I am seeing following issue while doing re-exporting the ganesha
> > export (remove_export -> add_export)
> >
> > The crash happens during add_export operation and there are some
> > applications using the export while this is happening. We also
> > implemented mechanism to stall the IOs when this operation is
> > happening but it did not help.
> >
> >
> >
> > Program terminated with signal 11, Segmentation fault.
> >
> > #0 0x0000000000530d27 in mdcache_lru_cleanup_push
> > (entry=0x7f3078007f50)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_lru.c:935
> >
> > 935 LRU_DQ_SAFE(lru, q);
> >
> > Missing separate debuginfos, use: debuginfo-install
> > bzip2-libs-1.0.6-12.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
> > elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64
> > gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
> > krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64
> > libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-21.el7.x86_64
> > libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
> > libgcrypt-1.5.3-12.el7.x86_64 libgpg-error-1.12-3.el7.x86_64
> > libnfsidmap-0.25-11.el7.x86_64 libselinux-2.5-12.el7.x86_64
> > libuuid-2.23.2-21.el7.x86_64 lz4-1.7.5-2.el7.x86_64
> > pcre-8.32-17.el7.x86_64
> > systemd-libs-219-57.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64
> > zlib-1.2.7-13.el7.x86_64
> >
> > (gdb) bt
> >
> > #0 0x0000000000530d27 in mdcache_lru_cleanup_push
> > (entry=0x7f3078007f50)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_lru.c:935
> >
> > #1 0x000000000054a0fc in _mdcache_kill_entry (entry=0x7f3078007f50,
> >
> > file=0x5a0970
> > "/root/rpmbuild/BUILD/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FS
> > ALs/FSAL_MDCACHE/mdcache_helpers.c",
> > line=218,
> >
> > function=0x5a2120 <__func__.23400> "mdcache_alloc_handle")
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_helpers.c:3310
> >
> > #2 0x00000000005419de in mdcache_alloc_handle (export=0x7f30bc05d070,
> > sub_handle=0x7f3078007c20, fs=0x0)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_helpers.c:218
> >
> > #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> > sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> > new_directory=false,
> >
> > entry=0x7f31287c67e8, state=0x0) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_helpers.c:590
> >
> > #4 0x0000000000544093 in mdcache_locate_host (fh_desc=0x7f31287c6c60,
> > export=0x7f30bc05d070, entry=0x7f31287c67e8, attrs_out=0x0)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_helpers.c:998
> >
> > #5 0x000000000053d1b3 in mdcache_create_handle
> > (exp_hdl=0x7f30bc05d070, fh_desc=0x7f31287c6c60,
> > handle=0x7f31287c6c58, attrs_out=0x0)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_handle.c:1902
> >
> > #6 0x000000000047729d in nfs4_mds_putfh (data=0x7f31287c6d60) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
> > tfh.c:211
> >
> > #7 0x0000000000477486 in nfs4_op_putfh (op=0x7f30a00424c0,
> > data=0x7f31287c6d60, resp=0x7f3078000d30)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
> > tfh.c:281
> >
> > #8 0x000000000045f670 in nfs4_Compound (arg=0x7f30a00010f0,
> > req=0x7f30a00008e8, res=0x7f3078001fa0)
> >
> > at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_Compo
> > und.c:743
> >
> > #9 0x000000000044c20d in nfs_rpc_execute (reqdata=0x7f30a00008c0) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
> > ad.c:1289
> >
> > #10 0x000000000044ca17 in worker_run (ctx=0x1d88bf0) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
> > ad.c:1561
> >
> > #11 0x0000000000508a7a in fridgethr_start_routine (arg=0x1d88bf0) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/support/fridgethr.c:550
> >
> > #12 0x00007f31797f2df5 in start_thread (arg=0x7f31287c8700) at
> > pthread_create.c:308
> >
> > #13 0x00007f3178eb31ad in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> >
> >
> >
> > (gdb) p q
> >
> > $1 = (struct lru_q *) 0x0
> >
> > (gdb) f 3
> >
> > #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
> > sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
> > new_directory=false,
> >
> > entry=0x7f31287c67e8, state=0x0) at
> > /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
> > L_MDCACHE/mdcache_helpers.c:590
> >
> > 590 nentry = mdcache_alloc_handle(export, sub_handle,
> > sub_handle->fs);
> >
> > (gdb) p export->flags
> >
> > $2 = 1 '\001'
> >
> > (gdb) p entry->lru
> >
> > $6 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 1,
> > flags = 0, lane = 2, cf = 0}
> >
> >
> >
> > mdc_check_mapping () in following snippet returns error because
> > MDC_UNEXPORT flag in export->flags is set.
> >
> > After an export is completely removed and ganesha_mgr remove_export
> > command returned successfully, why do we see the flag set?
> >
> > The comment also says “The current export is in process to be unexported”
> >
> >
> >
> > /* Map the export before we put this entry into the LRU, but
> > after it's
> >
> > * well enough set up to be able to be unrefed by unexport
> > should there
> >
> > * be a race.
> >
> > */
> >
> > status = mdc_check_mapping(result);
> >
> >
> >
> > if (unlikely(FSAL_IS_ERROR(status))) {
> >
> > /* The current export is in process to be unexported,
> > don't
> >
> > * create new mdcache entries.
> >
> > */
> >
> > LogDebug(COMPONENT_CACHE_INODE,
> >
> > "Trying to allocate a new entry %p for export
> id %"
> >
> > PRIi16" that is in the process of being
> > unexported",
> >
> > result, op_ctx->ctx_export->export_id);
> >
> > mdcache_put(result);
> >
> > mdcache_kill_entry(result);
> >
> > return NULL;
> >
> > }
> >
> >
> >
> > The crash occurs because lru_queue_of() returns q = NULL due to
> > entry->lru.qid = LRU_NO_LANE and LRU_DQ_SAFE refers to q.
> >
> >
> >
> > Is there any deferred work during unexport in mdcache FSAL?
> >
> > If we add delay between remove_export and add_export, I did not see
> > the problem. But that does not seem to be elegant solution.
> >
> >
> >
> > Please help me understand if there are any limitations from mdcache
> > with respect to back to back unexport and export operation.
> >
> >
> >
> > Thanks,
> >
> > Sandeep
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material
> > for the sole use of the intended recipient. Any unauthorized review,
> > use or distribution by others is strictly prohibited. If you have
> > received the message by mistake, please advise the sender by reply
> > email and delete the message. Thank you."
> > **********************************************************************
> >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send
> > an email to devel-leave(a)lists.nfs-ganesha.org
> >
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>