Malahal and I think that my fix will resolve the issue.
Please test V2.7.0.3. If the double free issue is really resolved, I will immediately tag
V2.7.1 (unless we have some other worthy fixes in flight next week).
Thanks
Frank
From: LUCAS Patrice [mailto:patrice.lucas@cea.fr]
Sent: Friday, October 5, 2018 12:49 AM
To: Malahal Naineni <malahal(a)gmail.com>; mbenjami(a)redhat.com
Cc: ffilzlnx(a)mindspring.com; devel(a)lists.nfs-ganesha.org
Subject: Re: [NFS-Ganesha-Devel] Re: double-free bug
On 10/05/18 09:20, Malahal Naineni wrote:
Initially, I thought that the nfs4_Compound_Free() called from nfs_dupreq_rele (Thread10)
and nfs4_Compound_Free() called from nfs4_op_sequence while freeing the slot cache
(Thread7) take the same nfs_res_t POINTER. If they both run at the same time, then
nfs4_op_sequence could set res_cached to False, then they both find it false and release
resources leading to a double free.
Well, my assumption was wrong! The nfs4_op_sequence would be using memory from the SLOT
cache and the nfs_dupreq_rele() will be using memory allocated with alloc_nfs_res().
For this double free to happen, the above 2 structures should be out of SYNC as far as
"res_cached" field. This can happen if data.sa_cachethis is not true. I posted a
patch here with WIP:
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/428128
"sa_cachethis" is operation specific as well as NFS client implementation
specific looks like. What NFS client is used here? This doesn't seem to be a timing
issue.
It's a Fedora28 nfs client, using a NFSv4.1 mount.
Regards,
Patrice
Regards, Malahal.
On Fri, Oct 5, 2018 at 1:26 AM Matt Benjamin <mbenjami(a)redhat.com
<mailto:mbenjami@redhat.com> > wrote:
The drc path does no work but frees the request; I don't see how
that's the problem
On Thu, Oct 4, 2018 at 3:49 PM, Malahal Naineni <malahal(a)gmail.com
<mailto:malahal@gmail.com> > wrote:
nfs4_Compound_Free() looks at res_cached to free the stuff or not.
The code
in nfs4_op_sequence() sets it to False and then calls nfs4_Compound_Free().
I don't see any lock that prevents Thread10 (nfs_dupreq_rele path) running
at the same time. I am new to this code, so I might be wrong in my analysis
though!
One option is to bypass DRC for NFS4.1 and above.
Regards, Malahal.
On Fri, Oct 5, 2018 at 12:01 AM Frank Filz <ffilzlnx(a)mindspring.com
<mailto:ffilzlnx@mindspring.com> > wrote:
>
> Yea, pretty much 4.1 are not cacheable (4.1 uses the slot cache and so has
> no need of the dupreq cache).
>
>
>
> With my patch, test_stateid isn’t doing anything different than any of the
> other 4.1 ops that actually free memory in their Free routines.
>
>
>
> Maybe they all can lead to double free? In which case somewhere along the
> line we are doing something wrong with the dup req cache for 4.1///
>
>
>
> Frank
>
>
>
> From: Malahal Naineni [mailto:malahal@gmail.com <mailto:malahal@gmail.com> ]
> Sent: Thursday, October 4, 2018 11:16 AM
> To: ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>
> Cc: patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr> ;
devel(a)lists.nfs-ganesha.org <mailto:devel@lists.nfs-ganesha.org>
> Subject: [NFS-Ganesha-Devel] Re: double-free bug
>
>
>
> Thread10 thinks that op_test stateid_is not cachable, so it actually frees
> the response and other goodies allocated. But thread7 finds in the slot
> cache and tries to free leading to a double free. The code path has to be
> for minor version 1 or 2 (not zero) based on line numbers. I don't know much
> about 4.1 slot cache.
>
>
>
> Regards, Malahal.
>
>
>
> On Thu, Oct 4, 2018 at 8:52 PM Frank Filz <ffilzlnx(a)mindspring.com
<mailto:ffilzlnx@mindspring.com> > wrote:
>
> The only thing I can think of is thata TEST_STATEID was issued with minor
> version = 0 which is the only way it can fail.
>
>
>
> I’m going to submit a fix that checks for return status before freeing.
>
>
>
> A couple Free routines NULL out the values they free, but almost all check
> for NFS4_OK. There are a couple others that also don’t check. I’ll fix those
> too.
>
>
>
> Frank
>
>
>
> From: patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr>
[mailto:patrice.lucas@cea.fr <mailto:patrice.lucas@cea.fr> ]
> Sent: Thursday, October 4, 2018 6:43 AM
> To: devel(a)lists.nfs-ganesha.org <mailto:devel@lists.nfs-ganesha.org>
> Subject: [NFS-Ganesha-Devel] double-free bug
>
>
>
> Hello everyone,
>
>
>
> Frequent memory crashs have been occurring for few weeks in the
> nfs-ganesha CEA FSAL-PROXY continuous integration test. I finally make time
> for looking at these problems today by running the nfs-ganesha server under
> Address Sanitizer.
>
>
>
> I got the following stack wih a double-free error. Could anyone explain
> this error ? Someone who well understand the dup-req cache ? Or someone who
> already works with the code of the nfs4_op_test_stateid operation ?
>
>
>
> The nfs4_op_test_stateid was introduce this summer by gerrit patch 418826
> from
> fatih-acar, 07/22/2018.
>
>
>
> The dup-req cache stack seems to be involved in this error.
>
>
>
> Regards,
>
> Patrice
>
>
>
>
>
> ==7037==ERROR: AddressSanitizer: attempting double-free on 0x60200001ced0
> in thread T7:
> #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
> #1 0x897125 in gsh_free
> /opt/nfs-ganesha/src/include/abstract_mem.h:299
> #2 0x896f88 in nfs4_op_test_stateid_Free
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
> #3 0x703702 in nfs4_Compound_FreeOne
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
> #4 0x7042c4 in nfs4_Compound_Free
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
> #5 0x865c4a in nfs4_op_sequence
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
> #6 0x6fd80f in nfs4_Compound
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
> #7 0x67167c in nfs_rpc_process_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
> #8 0x663040 in nfs_rpc_valid_NFS
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
> #9 0x7ffff7bb94a1 in svc_vc_decode
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
> #10 0x6542ce in nfs_rpc_decode_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
> #11 0x7ffff7bb934c in svc_vc_recv
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
> #12 0x7ffff7bb47be in svc_rqst_xprt_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
> #13 0x7ffff7bb51af in svc_rqst_epoll_events
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
> #14 0x7ffff7bb4e94 in svc_rqst_epoll_loop
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
> #15 0x7ffff7bb38bf in svc_rqst_run_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
> #16 0x7ffff7bc077c in work_pool_thread
> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
> #17 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
>
> 0x60200001ced0 is located 0 bytes inside of 4-byte region
> [0x60200001ced0,0x60200001ced4)
> freed by thread T10 here:
> #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
> #1 0x897125 in gsh_free
> /opt/nfs-ganesha/src/include/abstract_mem.h:299
> #2 0x896f88 in nfs4_op_test_stateid_Free
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
> #3 0x703702 in nfs4_Compound_FreeOne
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
> #4 0x7042c4 in nfs4_Compound_Free
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
> #5 0xcec2a4 in nfs_dupreq_rele
> /opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
> #6 0x673196 in nfs_rpc_process_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
> #7 0x663040 in nfs_rpc_valid_NFS
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
> #8 0x7ffff7bb94a1 in svc_vc_decode
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
> #9 0x6542ce in nfs_rpc_decode_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
> #10 0x7ffff7bb934c in svc_vc_recv
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
> #11 0x7ffff7bb47be in svc_rqst_xprt_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
> #12 0x7ffff7bb51af in svc_rqst_epoll_events
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
> #13 0x7ffff7bb4e94 in svc_rqst_epoll_loop
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
> #14 0x7ffff7bb38bf in svc_rqst_run_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
> #15 0x7ffff7bc077c in work_pool_thread
> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
> #16 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>
> previously allocated by thread T10 here:
> #0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
> #1 0x89689a in gsh_calloc__
> /opt/nfs-ganesha/src/include/abstract_mem.h:145
> #2 0x895c4e in nfs4_op_test_stateid
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
> #3 0x6fd80f in nfs4_Compound
> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
> #4 0x67167c in nfs_rpc_process_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
> #5 0x663040 in nfs_rpc_valid_NFS
> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
> #6 0x7ffff7bb94a1 in svc_vc_decode
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
> #7 0x6542ce in nfs_rpc_decode_request
> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
> #8 0x7ffff7bb934c in svc_vc_recv
> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
> #9 0x7ffff7bb47be in svc_rqst_xprt_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
> #10 0x7ffff7bb51af in svc_rqst_epoll_events
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
> #11 0x7ffff7bb4e94 in svc_rqst_epoll_loop
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
> #12 0x7ffff7bb38bf in svc_rqst_run_task
> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
> #13 0x7ffff7bc077c in work_pool_thread
> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
> #14 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
--
Patrice LUCAS
Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S
tel : +33 (0)1 69 26 47 86
e-mail : patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr>