Note, all of these are also backported to 2.6.3, if that's easier.
On Wed, Sep 19, 2018 at 9:27 AM, Sandeep Nashikkar
<snashikkar(a)commvault.com> wrote:
> Thanks Daniel. Will check this with more recent version of nfs-ganesha.
>
> Thanks,
> Sandeep
>
> -----Original Message-----
> From: Daniel Gryniewicz [mailto:dang@redhat.com]
> Sent: 19 September 2018 18:22
> To: Sandeep Nashikkar <snashikkar(a)commvault.com>
> Cc: devel(a)lists.nfs-ganesha.org
> Subject: Re: [NFS-Ganesha-Devel] Crash in mdcache_lru_cleanup_push()
>
> We've fixed quite a few unexport/export races since 2.5.1. A few that jump out
at me are:
>
> 569039055fe209aadda7eabf5a5e230ae8938d25 - MDCACHE - Close more export/unexport
races
> d287a4eb404166c6f8eb6a468304504b930bee43 - MDCACHE - Close an unexport race
> ceb4aed76dd3bb39f857853043800c556e475cf8 - Fixup
unexport/lru_run_lane/mdcache_lru_clean races dc83243feed5c78267b682cf47addd9d83a41adb -
Fix race between mdcache_unexport and mdc_check_mapping
36bc72781e395bfd6ba66a27ec22a9a9c66f366b - add cih_remove_checked() in
> mdc_clean_entry()
>
> Those are a quick scan of commit logs for the last year, there may be more.
Export/unexport in a thigh loop had quite a few problems, but we believe it works (at
least for MDCACHE; there may be FSAL level bugs in some FSALS) in 2.7.0
>
> Daniel
>
>
> On Wed, Sep 19, 2018 at 4:31 AM, Sandeep Nashikkar <snashikkar(a)commvault.com>
wrote:
>> Ganesha Version: 2.5.1
>>
>> Platform: Linux x86_64
>>
>>
>>
>> I am seeing following issue while doing re-exporting the ganesha
>> export (remove_export -> add_export)
>>
>> The crash happens during add_export operation and there are some
>> applications using the export while this is happening. We also
>> implemented mechanism to stall the IOs when this operation is
>> happening but it did not help.
>>
>>
>>
>> Program terminated with signal 11, Segmentation fault.
>>
>> #0 0x0000000000530d27 in mdcache_lru_cleanup_push
>> (entry=0x7f3078007f50)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_lru.c:935
>>
>> 935 LRU_DQ_SAFE(lru, q);
>>
>> Missing separate debuginfos, use: debuginfo-install
>> bzip2-libs-1.0.6-12.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
>> elfutils-libelf-0.160-1.el7.x86_64 elfutils-libs-0.160-1.el7.x86_64
>> gssproxy-0.3.0-10.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
>> krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-14.el7.x86_64
>> libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-21.el7.x86_64
>> libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>> libgcrypt-1.5.3-12.el7.x86_64 libgpg-error-1.12-3.el7.x86_64
>> libnfsidmap-0.25-11.el7.x86_64 libselinux-2.5-12.el7.x86_64
>> libuuid-2.23.2-21.el7.x86_64 lz4-1.7.5-2.el7.x86_64
>> pcre-8.32-17.el7.x86_64
>> systemd-libs-219-57.el7.x86_64 xz-libs-5.1.2-9alpha.el7.x86_64
>> zlib-1.2.7-13.el7.x86_64
>>
>> (gdb) bt
>>
>> #0 0x0000000000530d27 in mdcache_lru_cleanup_push
>> (entry=0x7f3078007f50)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_lru.c:935
>>
>> #1 0x000000000054a0fc in _mdcache_kill_entry (entry=0x7f3078007f50,
>>
>> file=0x5a0970
>> "/root/rpmbuild/BUILD/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FS
>> ALs/FSAL_MDCACHE/mdcache_helpers.c",
>> line=218,
>>
>> function=0x5a2120 <__func__.23400> "mdcache_alloc_handle")
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_helpers.c:3310
>>
>> #2 0x00000000005419de in mdcache_alloc_handle (export=0x7f30bc05d070,
>> sub_handle=0x7f3078007c20, fs=0x0)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_helpers.c:218
>>
>> #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
>> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
>> new_directory=false,
>>
>> entry=0x7f31287c67e8, state=0x0) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_helpers.c:590
>>
>> #4 0x0000000000544093 in mdcache_locate_host (fh_desc=0x7f31287c6c60,
>> export=0x7f30bc05d070, entry=0x7f31287c67e8, attrs_out=0x0)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_helpers.c:998
>>
>> #5 0x000000000053d1b3 in mdcache_create_handle
>> (exp_hdl=0x7f30bc05d070, fh_desc=0x7f31287c6c60,
>> handle=0x7f31287c6c58, attrs_out=0x0)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_handle.c:1902
>>
>> #6 0x000000000047729d in nfs4_mds_putfh (data=0x7f31287c6d60) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
>> tfh.c:211
>>
>> #7 0x0000000000477486 in nfs4_op_putfh (op=0x7f30a00424c0,
>> data=0x7f31287c6d60, resp=0x7f3078000d30)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_op_pu
>> tfh.c:281
>>
>> #8 0x000000000045f670 in nfs4_Compound (arg=0x7f30a00010f0,
>> req=0x7f30a00008e8, res=0x7f3078001fa0)
>>
>> at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/Protocols/NFS/nfs4_Compo
>> und.c:743
>>
>> #9 0x000000000044c20d in nfs_rpc_execute (reqdata=0x7f30a00008c0) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
>> ad.c:1289
>>
>> #10 0x000000000044ca17 in worker_run (ctx=0x1d88bf0) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/MainNFSD/nfs_worker_thre
>> ad.c:1561
>>
>> #11 0x0000000000508a7a in fridgethr_start_routine (arg=0x1d88bf0) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/support/fridgethr.c:550
>>
>> #12 0x00007f31797f2df5 in start_thread (arg=0x7f31287c8700) at
>> pthread_create.c:308
>>
>> #13 0x00007f3178eb31ad in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>>
>>
>>
>> (gdb) p q
>>
>> $1 = (struct lru_q *) 0x0
>>
>> (gdb) f 3
>>
>> #3 0x00000000005430ea in mdcache_new_entry (export=0x7f30bc05d070,
>> sub_handle=0x7f3078007c20, attrs_in=0x7f31287c6670, attrs_out=0x0,
>> new_directory=false,
>>
>> entry=0x7f31287c67e8, state=0x0) at
>> /usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSA
>> L_MDCACHE/mdcache_helpers.c:590
>>
>> 590 nentry = mdcache_alloc_handle(export, sub_handle,
>> sub_handle->fs);
>>
>> (gdb) p export->flags
>>
>> $2 = 1 '\001'
>>
>> (gdb) p entry->lru
>>
>> $6 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 1,
>> flags = 0, lane = 2, cf = 0}
>>
>>
>>
>> mdc_check_mapping () in following snippet returns error because
>> MDC_UNEXPORT flag in export->flags is set.
>>
>> After an export is completely removed and ganesha_mgr remove_export
>> command returned successfully, why do we see the flag set?
>>
>> The comment also says “The current export is in process to be unexported”
>>
>>
>>
>> /* Map the export before we put this entry into the LRU, but
>> after it's
>>
>> * well enough set up to be able to be unrefed by unexport
>> should there
>>
>> * be a race.
>>
>> */
>>
>> status = mdc_check_mapping(result);
>>
>>
>>
>> if (unlikely(FSAL_IS_ERROR(status))) {
>>
>> /* The current export is in process to be unexported,
>> don't
>>
>> * create new mdcache entries.
>>
>> */
>>
>> LogDebug(COMPONENT_CACHE_INODE,
>>
>> "Trying to allocate a new entry %p for export id
%"
>>
>> PRIi16" that is in the process of being
>> unexported",
>>
>> result, op_ctx->ctx_export->export_id);
>>
>> mdcache_put(result);
>>
>> mdcache_kill_entry(result);
>>
>> return NULL;
>>
>> }
>>
>>
>>
>> The crash occurs because lru_queue_of() returns q = NULL due to
>> entry->lru.qid = LRU_NO_LANE and LRU_DQ_SAFE refers to q.
>>
>>
>>
>> Is there any deferred work during unexport in mdcache FSAL?
>>
>> If we add delay between remove_export and add_export, I did not see
>> the problem. But that does not seem to be elegant solution.
>>
>>
>>
>> Please help me understand if there are any limitations from mdcache
>> with respect to back to back unexport and export operation.
>>
>>
>>
>> Thanks,
>>
>> Sandeep
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material
>> for the sole use of the intended recipient. Any unauthorized review,
>> use or distribution by others is strictly prohibited. If you have
>> received the message by mistake, please advise the sender by reply
>> email and delete the message. Thank you."
>> **********************************************************************
>>
>> _______________________________________________
>> Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send
>> an email to devel-leave(a)lists.nfs-ganesha.org
>>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or distribution
> by others is strictly prohibited. If you have received the message by mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************