Of course that would also not be a double free…
So really I need to figure out how the drc code is broken, but doesn’t break for other
more likely executed functions. The one possibility is a change was made AFTER I made
Ganesha pay attention to cache_this (I could see Read, Readlink, Exchange ID not being
cached, though GetFH should be for OPEN/CREATE).
Frank
From: Frank Filz [mailto:ffilzlnx@mindspring.com]
Sent: Thursday, October 4, 2018 1:24 PM
To: 'Malahal Naineni' <malahal(a)gmail.com>
Cc: patrice.lucas(a)cea.fr; devel(a)lists.nfs-ganesha.org
Subject: [NFS-Ganesha-Devel] Re: double-free bug
nfs_dupreq_v4_cacheable is supposed to exclude all 4.1 requests.
I wonder if something changed. There are other requests that free things in their Free
functions that should have been run under 4.1 and thus if we really are both caching
something in slot cache and drc somehow, they would have tripped also.
Here’s all the places that gsh_free something in res:
Protocols/NFS/nfs4_Compound.c nfs4_Compound 962
gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1128
gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1131
gsh_free(res->res_compound4.tag.utf8string_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 441
gsh_free(resok->eir_server_scope.eir_server_scope_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 442
gsh_free(resok->eir_server_owner.so_major_id.so_major_id_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 443
gsh_free(resok->eir_server_impl_id.eir_server_impl_id_val);
Protocols/NFS/nfs4_op_getdeviceinfo.c nfs4_op_getdeviceinfo_Free 205
gsh_free(resok->gdir_device_addr.da_addr_body.da_addr_body_val);
Protocols/NFS/nfs4_op_getdevicelist.c nfs4_op_getdevicelist_Free 196
gsh_free(resok->gdlr_deviceid_list.gdlr_deviceid_list_val);
Protocols/NFS/nfs4_op_getfh.c nfs4_op_getfh_Free 138
gsh_free(resp->GETFH4res_u.resok4.object.nfs_fh4_val);
Protocols/NFS/nfs4_op_read.c nfs4_op_read_Free 615
gsh_free(resp->READ4res_u.resok4.data.data_val);
Protocols/NFS/nfs4_op_readlink.c nfs4_op_readlink_Free 125
gsh_free(resp->READLINK4res_u.resok4.link.utf8string_val);
Protocols/NFS/nfs4_op_secinfo.c nfs4_op_secinfo_Free 337
gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_secinfo_no_name.c nfs4_op_secinfo_no_name_Free 209
gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_setclientid.c nfs4_op_setclientid_Free 384
gsh_free(resp->SETCLIENTID4res_u.client_using.r_addr);
Protocols/NFS/nfs4_op_test_stateid.c nfs4_op_test_stateid_Free 121
gsh_free(res->tsr_status_codes.tsr_status_codes_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_getxattr_Free 142
gsh_free(res_GETXATTR4->GETXATTR4res_u.resok4.gr_value.utf8string_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_listxattr_Free 316
gsh_free(res_LISTXATTR4->LISTXATTR4res_u.resok4.lr_names.entries);
So I’m struggling to see what is unique about test_stateid other than that it didn’t check
the return code, which can only be NFS4_OK or NFS4ERR_INVAL, and only invalid if
TEST_STATEID was issued with minorversion = 0.
Frank
From: Malahal Naineni [mailto:malahal@gmail.com]
Sent: Thursday, October 4, 2018 12:49 PM
To: ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>
Cc: patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr> ; devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
Subject: Re: [NFS-Ganesha-Devel] Re: double-free bug
nfs4_Compound_Free() looks at res_cached to free the stuff or not. The code in
nfs4_op_sequence() sets it to False and then calls nfs4_Compound_Free(). I don't see
any lock that prevents Thread10 (nfs_dupreq_rele path) running at the same time. I am new
to this code, so I might be wrong in my analysis though!
One option is to bypass DRC for NFS4.1 and above.
Regards, Malahal.
On Fri, Oct 5, 2018 at 12:01 AM Frank Filz <ffilzlnx(a)mindspring.com
<mailto:ffilzlnx@mindspring.com> > wrote:
Yea, pretty much 4.1 are not cacheable (4.1 uses the slot cache and so has no need of the
dupreq cache).
With my patch, test_stateid isn’t doing anything different than any of the other 4.1 ops
that actually free memory in their Free routines.
Maybe they all can lead to double free? In which case somewhere along the line we are
doing something wrong with the dup req cache for 4.1///
Frank
From: Malahal Naineni [mailto:malahal@gmail.com <mailto:malahal@gmail.com> ]
Sent: Thursday, October 4, 2018 11:16 AM
To: ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>
Cc: patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr> ; devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
Subject: [NFS-Ganesha-Devel] Re: double-free bug
Thread10 thinks that op_test stateid_is not cachable, so it actually frees the response
and other goodies allocated. But thread7 finds in the slot cache and tries to free leading
to a double free. The code path has to be for minor version 1 or 2 (not zero) based on
line numbers. I don't know much about 4.1 slot cache.
Regards, Malahal.
On Thu, Oct 4, 2018 at 8:52 PM Frank Filz <ffilzlnx(a)mindspring.com
<mailto:ffilzlnx@mindspring.com> > wrote:
The only thing I can think of is thata TEST_STATEID was issued with minor version = 0
which is the only way it can fail.
I’m going to submit a fix that checks for return status before freeing.
A couple Free routines NULL out the values they free, but almost all check for NFS4_OK.
There are a couple others that also don’t check. I’ll fix those too.
Frank
From: patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr>
[mailto:patrice.lucas@cea.fr <mailto:patrice.lucas@cea.fr> ]
Sent: Thursday, October 4, 2018 6:43 AM
To: devel(a)lists.nfs-ganesha.org <mailto:devel@lists.nfs-ganesha.org>
Subject: [NFS-Ganesha-Devel] double-free bug
Hello everyone,
Frequent memory crashs have been occurring for few weeks in the nfs-ganesha CEA FSAL-PROXY
continuous integration test. I finally make time for looking at these problems today by
running the nfs-ganesha server under Address Sanitizer.
I got the following stack wih a double-free error. Could anyone explain this error ?
Someone who well understand the dup-req cache ? Or someone who already works with the code
of the nfs4_op_test_stateid operation ?
The nfs4_op_test_stateid was introduce this summer by gerrit patch 418826
<
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/418826> from
fatih-acar <
https://review.gerrithub.io/q/owner:fatih%2540gandi.net> , 07/22/2018.
The dup-req cache stack seems to be involved in this error.
Regards,
Patrice
==7037==ERROR: AddressSanitizer: attempting double-free on 0x60200001ced0 in thread T7:
#0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free /opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0x865c4a in nfs4_op_sequence
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
#6 0x6fd80f in nfs4_Compound /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#7 0x67167c in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#8 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#9 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#10 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#11 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#12 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#13 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#14 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#15 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#16 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#17 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
#18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
0x60200001ced0 is located 0 bytes inside of 4-byte region [0x60200001ced0,0x60200001ced4)
freed by thread T10 here:
#0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free /opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0xcec2a4 in nfs_dupreq_rele /opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
#6 0x673196 in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
#7 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#8 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#9 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#10 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#11 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#12 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#13 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#14 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#15 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#16 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
previously allocated by thread T10 here:
#0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
#1 0x89689a in gsh_calloc__ /opt/nfs-ganesha/src/include/abstract_mem.h:145
#2 0x895c4e in nfs4_op_test_stateid
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
#3 0x6fd80f in nfs4_Compound /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#4 0x67167c in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#5 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#6 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#7 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#8 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#9 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#10 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#11 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#12 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#13 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#14 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>