Malahal and I think that my fix will resolve the issue.


Please test V2.7.0.3. If the double free issue is really resolved, I will immediately tag V2.7.1 (unless we have some other worthy fixes in flight next week).

 

Thanks

 

Frank

 

From: LUCAS Patrice [mailto:patrice.lucas@cea.fr]
Sent: Friday, October 5, 2018 12:49 AM
To: Malahal Naineni <malahal@gmail.com>; mbenjami@redhat.com
Cc: ffilzlnx@mindspring.com; devel@lists.nfs-ganesha.org
Subject: Re: [NFS-Ganesha-Devel] Re: double-free bug

 

On 10/05/18 09:20, Malahal Naineni wrote:

Initially, I thought that the nfs4_Compound_Free() called from nfs_dupreq_rele (Thread10) and nfs4_Compound_Free() called from nfs4_op_sequence while freeing the slot cache (Thread7) take the same nfs_res_t POINTER. If they both run at the same time, then nfs4_op_sequence could set res_cached to False, then they both find it false and release resources leading to a double free.

 

Well, my assumption was wrong! The nfs4_op_sequence would be using memory from the SLOT cache and the nfs_dupreq_rele() will be using memory allocated with alloc_nfs_res().

 

For this double free to happen, the above 2 structures should be out of SYNC as far as "res_cached" field. This can happen if data.sa_cachethis is not true. I posted a patch here with WIP:  https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/428128

 

"sa_cachethis" is operation specific as well as NFS client implementation specific looks like. What NFS client is used here? This doesn't seem to be a timing issue.




It's a Fedora28 nfs client, using a NFSv4.1 mount.

Regards,
Patrice




 

Regards, Malahal.

 

On Fri, Oct 5, 2018 at 1:26 AM Matt Benjamin <mbenjami@redhat.com> wrote:

The drc path does no work but frees the request;  I don't see how
that's the problem

On Thu, Oct 4, 2018 at 3:49 PM, Malahal Naineni <malahal@gmail.com> wrote:
> nfs4_Compound_Free() looks at res_cached to free the stuff or not. The code
> in nfs4_op_sequence() sets it to False and then calls nfs4_Compound_Free().
> I don't see any lock that prevents Thread10 (nfs_dupreq_rele path) running
> at the same time. I am new to this code, so I might be wrong in my analysis
> though!
>
> One option is to bypass DRC for NFS4.1 and above.
>
> Regards, Malahal.
>
> On Fri, Oct 5, 2018 at 12:01 AM Frank Filz <ffilzlnx@mindspring.com> wrote:
>>
>> Yea, pretty much 4.1 are not cacheable (4.1 uses the slot cache and so has
>> no need of the dupreq cache).
>>
>>
>>
>> With my patch, test_stateid isn’t doing anything different than any of the
>> other 4.1 ops that actually free memory in their Free routines.
>>
>>
>>
>> Maybe they all can lead to double free? In which case somewhere along the
>> line we are doing something wrong with the dup req cache for 4.1///
>>
>>
>>
>> Frank
>>
>>
>>
>> From: Malahal Naineni [mailto:malahal@gmail.com]
>> Sent: Thursday, October 4, 2018 11:16 AM
>> To: ffilzlnx@mindspring.com
>> Cc: patrice.lucas@cea.fr; devel@lists.nfs-ganesha.org
>> Subject: [NFS-Ganesha-Devel] Re: double-free bug
>>
>>
>>
>> Thread10 thinks that op_test stateid_is not cachable, so it actually frees
>> the response and other goodies allocated. But thread7 finds in the slot
>> cache and tries to free leading to a double free. The code path has to be
>> for minor version 1 or 2 (not zero) based on line numbers. I don't know much
>> about 4.1 slot cache.
>>
>>
>>
>> Regards, Malahal.
>>
>>
>>
>> On Thu, Oct 4, 2018 at 8:52 PM Frank Filz <ffilzlnx@mindspring.com> wrote:
>>
>> The only thing I can think of is thata TEST_STATEID was issued with minor
>> version = 0 which is the only way it can fail.
>>
>>
>>
>> I’m going to submit a fix that checks for return status before freeing.
>>
>>
>>
>> A couple Free routines NULL out the values they free, but almost all check
>> for NFS4_OK. There are a couple others that also don’t check. I’ll fix those
>> too.
>>
>>
>>
>> Frank
>>
>>
>>
>> From: patrice.lucas@cea.fr [mailto:patrice.lucas@cea.fr]
>> Sent: Thursday, October 4, 2018 6:43 AM
>> To: devel@lists.nfs-ganesha.org
>> Subject: [NFS-Ganesha-Devel] double-free bug
>>
>>
>>
>> Hello everyone,
>>
>>
>>
>> Frequent memory crashs have been occurring for few weeks in the
>> nfs-ganesha CEA FSAL-PROXY continuous integration test. I finally make time
>> for looking at these problems today by running the nfs-ganesha server under
>> Address Sanitizer.
>>
>>
>>
>> I got the following stack wih a double-free error. Could anyone explain
>> this error ? Someone who well understand the dup-req cache ? Or someone who
>> already works with the code of the nfs4_op_test_stateid operation ?
>>
>>
>>
>> The nfs4_op_test_stateid was introduce this summer by gerrit patch 418826
>> from
>> fatih-acar, 07/22/2018.
>>
>>
>>
>> The dup-req cache stack seems to be involved in this error.
>>
>>
>>
>> Regards,
>>
>> Patrice
>>
>>
>>
>>
>>
>> ==7037==ERROR: AddressSanitizer: attempting double-free on 0x60200001ced0
>> in thread T7:
>>     #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
>>     #1 0x897125 in gsh_free
>> /opt/nfs-ganesha/src/include/abstract_mem.h:299
>>     #2 0x896f88 in nfs4_op_test_stateid_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
>>     #3 0x703702 in nfs4_Compound_FreeOne
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
>>     #4 0x7042c4 in nfs4_Compound_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
>>     #5 0x865c4a in nfs4_op_sequence
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
>>     #6 0x6fd80f in nfs4_Compound
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
>>     #7 0x67167c in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
>>     #8 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>>     #9 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>>     #10 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>>     #11 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>>     #12 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>>     #13 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>>     #14 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>>     #15 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>>     #16 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>>     #17 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>>     #18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
>>
>> 0x60200001ced0 is located 0 bytes inside of 4-byte region
>> [0x60200001ced0,0x60200001ced4)
>> freed by thread T10 here:
>>     #0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
>>     #1 0x897125 in gsh_free
>> /opt/nfs-ganesha/src/include/abstract_mem.h:299
>>     #2 0x896f88 in nfs4_op_test_stateid_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
>>     #3 0x703702 in nfs4_Compound_FreeOne
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
>>     #4 0x7042c4 in nfs4_Compound_Free
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
>>     #5 0xcec2a4 in nfs_dupreq_rele
>> /opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
>>     #6 0x673196 in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
>>     #7 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>>     #8 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>>     #9 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>>     #10 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>>     #11 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>>     #12 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>>     #13 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>>     #14 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>>     #15 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>>     #16 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>>
>> previously allocated by thread T10 here:
>>     #0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
>>     #1 0x89689a in gsh_calloc__
>> /opt/nfs-ganesha/src/include/abstract_mem.h:145
>>     #2 0x895c4e in nfs4_op_test_stateid
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
>>     #3 0x6fd80f in nfs4_Compound
>> /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
>>     #4 0x67167c in nfs_rpc_process_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
>>     #5 0x663040 in nfs_rpc_valid_NFS
>> /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
>>     #6 0x7ffff7bb94a1 in svc_vc_decode
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
>>     #7 0x6542ce in nfs_rpc_decode_request
>> /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>>     #8 0x7ffff7bb934c in svc_vc_recv
>> /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
>>     #9 0x7ffff7bb47be in svc_rqst_xprt_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
>>     #10 0x7ffff7bb51af in svc_rqst_epoll_events
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
>>     #11 0x7ffff7bb4e94 in svc_rqst_epoll_loop
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
>>     #12 0x7ffff7bb38bf in svc_rqst_run_task
>> /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
>>     #13 0x7ffff7bc077c in work_pool_thread
>> /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
>>     #14 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
>>
>> _______________________________________________
>> Devel mailing list -- devel@lists.nfs-ganesha.org
>> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>



--

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

 

-- 
Patrice LUCAS
Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S
tel : +33 (0)1 69 26 47 86
e-mail : patrice.lucas@cea.fr