Initially, I thought that the nfs4_Compound_Free() called from nfs_dupreq_rele (Thread10) and nfs4_Compound_Free() called from nfs4_op_sequence while freeing the slot cache (Thread7) take the same nfs_res_t POINTER. If they both run at the same time, then nfs4_op_sequence could set res_cached to False, then they both find it false and release resources leading to a double free.
Well, my assumption was wrong! The nfs4_op_sequence would be using memory from the SLOT cache and the nfs_dupreq_rele() will be using memory allocated with alloc_nfs_res().
For this double free to happen, the above 2 structures should be out of SYNC as far as "res_cached" field. This can happen if data.sa_cachethis is not true. I posted a patch here with WIP: https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/428128
"sa_cachethis" is operation specific as well as NFS client implementation specific looks like. What NFS client is used here? This doesn't seem to be a timing issue.
Regards, Malahal.
On Fri, Oct 5, 2018 at 1:26 AM Matt Benjamin <mbenjami@redhat.com> wrote:
The drc path does no work but frees the request; I don't see how
that's the problem
On Thu, Oct 4, 2018 at 3:49 PM, Malahal Naineni <malahal@gmail.com> wrote:
> nfs4_Compound_Free() looks at res_cached to free the stuff or not. The code
> in nfs4_op_sequence() sets it to False and then calls nfs4_Compound_Free().
> I don't see any lock that prevents Thread10 (nfs_dupreq_rele path) running
> at the same time. I am new to this code, so I might be wrong in my analysis
> though!
>
> One option is to bypass DRC for NFS4.1 and above.
>
> Regards, Malahal.
>
> On Fri, Oct 5, 2018 at 12:01 AM Frank Filz <ffilzlnx@mindspring.com> wrote:
>>
>> Yea, pretty much 4.1 are not cacheable (4.1 uses the slot cache and so has
>> no need of the dupreq cache).
>>
>>
>>
>> With my patch, test_stateid isn’t doing anything different than any of the
>> other 4.1 ops that actually free memory in their Free routines.
>>
>>
>>
>> Maybe they all can lead to double free? In which case somewhere along the
>> line we are doing something wrong with the dup req cache for 4.1///
>>
>>
>>
>> Frank
>>
>>
>>
>> From: Malahal Naineni [mailto:malahal@gmail.com]
>> Sent: Thursday, October 4, 2018 11:16 AM
>> To: ffilzlnx@mindspring.com
>> Cc: patrice.lucas@cea.fr; devel@lists.nfs-ganesha.org
>> Subject: [NFS-Ganesha-Devel] Re: double-free bug
>>
>>
>>
>> Thread10 thinks that op_test stateid_is not cachable, so it actually frees
>> the response and other goodies allocated. But thread7 finds in the slot
>> cache and tries to free leading to a double free. The code path has to be
>> for minor version 1 or 2 (not zero) based on line numbers. I don't know much
>> about 4.1 slot cache.
>>
>>
>>
>> Regards, Malahal.
>>
>>
>>
>> On Thu, Oct 4, 2018 at 8:52 PM Frank Filz <ffilzlnx@mindspring.com> wrote:
>>
>> The only thing I can think of is thata TEST_STATEID was issued with minor
>> version = 0 which is the only way it can fail.
>>
>>
>>
>> I’m going to submit a fix that checks for return status before freeing.
>>
>>
>>
>> A couple Free routines NULL out the values they free, but almost all check
>> for NFS4_OK. There are a couple others that also don’t check. I’ll fix those
>> too.
>>
>>
>>
>> Frank
>>
>>
>>
>> From: patrice.lucas@cea.fr [mailto:patrice.lucas@cea.fr]
>> Sent: Thursday, October 4, 2018 6:43 AM
>> To: devel@lists.nfs-ganesha.org
>> Subject: [NFS-Ganesha-Devel] double-free bug
>>
>>
>>
>> Hello everyone,
>>
>>
>>
>> Frequent memory crashs have been occurring for few weeks in the
>> nfs-ganesha CEA FSAL-PROXY continuous integration test. I finally make time
>> for looking at these problems today by running the nfs-ganesha server under
>> Address Sanitizer.
>>
>>
>>
>> I got the following stack wih a double-free error. Could anyone explain
>> this error ? Someone who well understand the dup-req cache ? Or someone who
>> already works with the code of the nfs4_op_test_stateid operation ?
>>
>>
>>
>> The nfs4_op_test_stateid was introduce this summer by gerrit patch 418826
>> from
>> fatih-acar, 07/22/2018.
>>
>>
>>
>> The dup-req cache stack seems to be involved in this error.
>>
>>
>>
>> Regards,
>>
>> Patrice
>>
>>
>>
>>
>>
>> ==7037==ERROR: AddressSanitizer: attempting double-free on 0x60200001ced0
>> in thread T7:
>> #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
>> #1 0x897125 in gsh_free
>> /opt/nfs-ganesha/src/include/abstract_mem.h:299
>> #2 0x896f88 in nfs4_op_test_stateid_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
>> #3 0x703702 in nfs4_Compound_FreeOne
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
>> #4 0x7042c4 in nfs4_Compound_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
>> #5 0x865c4a in nfs4_op_sequence
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
>> #6 0x6fd80f in nfs4_Compound
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
>> #7 0x67167c in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
>> #8 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>> #9 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>> #10 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>> #11 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>> #12 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>> #13 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>> #14 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>> #15 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>> #16 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>> #17 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>> #18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
>>
>> 0x60200001ced0 is located 0 bytes inside of 4-byte region
>> [0x60200001ced0,0x60200001ced4)
>> freed by thread T10 here:
>> #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
>> #1 0x897125 in gsh_free
>> /opt/nfs-ganesha/src/include/abstract_mem.h:299
>> #2 0x896f88 in nfs4_op_test_stateid_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
>> #3 0x703702 in nfs4_Compound_FreeOne
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
>> #4 0x7042c4 in nfs4_Compound_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
>> #5 0xcec2a4 in nfs_dupreq_rele
>> /opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
>> #6 0x673196 in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
>> #7 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>> #8 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>> #9 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>> #10 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>> #11 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>> #12 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>> #13 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>> #14 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>> #15 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>> #16 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>>
>> previously allocated by thread T10 here:
>> #0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
>> #1 0x89689a in gsh_calloc__
>> /opt/nfs-ganesha/src/include/abstract_mem.h:145
>> #2 0x895c4e in nfs4_op_test_stateid
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
>> #3 0x6fd80f in nfs4_Compound
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
>> #4 0x67167c in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
>> #5 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>> #6 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>> #7 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>> #8 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>> #9 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>> #10 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>> #11 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>> #12 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>> #13 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>> #14 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>>
>> _______________________________________________
>> Devel mailing list -- devel@lists.nfs-ganesha.org
>> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
-- Patrice LUCAS Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S tel : +33 (0)1 69 26 47 86 e-mail : patrice.lucas@cea.fr