Patrice,
Could you check if V2.7.0.3 fixed this issue? I’d like to tag V2.7.1
sometime this week.
Thanks
Frank
Hi Frank,
With V2.7.0.3, I don't manage to reproduce the double free error on stateid.
Regards,
Patrice
*From:*Malahal Naineni [mailto:malahal@gmail.com]
*Sent:* Friday, October 5, 2018 12:38 AM
*To:* ffilzlnx(a)mindspring.com
*Cc:* patrice.lucas(a)cea.fr; devel(a)lists.nfs-ganesha.org
*Subject:* [NFS-Ganesha-Devel] Re: double-free bug
>> There are other requests that free things in their Free functions
that should have been run under 4.1 and thus if we really are both
caching something in slot cache and drc somehow, they would have
tripped also.
From the backtrace line numbers, it is clear that nfs_dupreq_rele()
has DUPREQ_NOCACHE. So DRC is NOT caching but only freeing.
Regards, Malahal.
On Fri, Oct 5, 2018 at 1:54 AM Frank Filz <ffilzlnx(a)mindspring.com
<mailto:ffilzlnx@mindspring.com>> wrote:
nfs_dupreq_v4_cacheable is supposed to exclude all 4.1 requests.
I wonder if something changed. There are other requests that free
things in their Free functions that should have been run under 4.1
and thus if we really are both caching something in slot cache and
drc somehow, they would have tripped also.
Here’s all the places that gsh_free something in res:
Protocols/NFS/nfs4_Compound.c nfs4_Compound 962
gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1128
gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1131
gsh_free(res->res_compound4.tag.utf8string_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 441
gsh_free(resok->eir_server_scope.eir_server_scope_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 442
gsh_free(resok->eir_server_owner.so_major_id.so_major_id_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 443
gsh_free(resok->eir_server_impl_id.eir_server_impl_id_val);
Protocols/NFS/nfs4_op_getdeviceinfo.c nfs4_op_getdeviceinfo_Free
205 gsh_free(resok->gdir_device_addr.da_addr_body.da_addr_body_val);
Protocols/NFS/nfs4_op_getdevicelist.c nfs4_op_getdevicelist_Free
196 gsh_free(resok->gdlr_deviceid_list.gdlr_deviceid_list_val);
Protocols/NFS/nfs4_op_getfh.c nfs4_op_getfh_Free 138
gsh_free(resp->GETFH4res_u.resok4.object.nfs_fh4_val);
Protocols/NFS/nfs4_op_read.c nfs4_op_read_Free 615
gsh_free(resp->READ4res_u.resok4.data.data_val);
Protocols/NFS/nfs4_op_readlink.c nfs4_op_readlink_Free 125
gsh_free(resp->READLINK4res_u.resok4.link.utf8string_val);
Protocols/NFS/nfs4_op_secinfo.c nfs4_op_secinfo_Free 337
gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_secinfo_no_name.c
nfs4_op_secinfo_no_name_Free 209
gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_setclientid.c nfs4_op_setclientid_Free 384
gsh_free(resp->SETCLIENTID4res_u.client_using.r_addr);
Protocols/NFS/nfs4_op_test_stateid.c nfs4_op_test_stateid_Free 121
gsh_free(res->tsr_status_codes.tsr_status_codes_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_getxattr_Free 142
gsh_free(res_GETXATTR4->GETXATTR4res_u.resok4.gr_value.utf8string_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_listxattr_Free 316
gsh_free(res_LISTXATTR4->LISTXATTR4res_u.resok4.lr_names.entries);
So I’m struggling to see what is unique about test_stateid other
than that it didn’t check the return code, which can only be
NFS4_OK or NFS4ERR_INVAL, and only invalid if TEST_STATEID was
issued with minorversion = 0.
Frank
*From:*Malahal Naineni [mailto:malahal@gmail.com
<mailto:malahal@gmail.com>]
*Sent:* Thursday, October 4, 2018 12:49 PM
*To:* ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>
*Cc:* patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr>;
devel(a)lists.nfs-ganesha.org <mailto:devel@lists.nfs-ganesha.org>
*Subject:* Re: [NFS-Ganesha-Devel] Re: double-free bug
nfs4_Compound_Free() looks at *res_cached* to free the stuff or
not. The code in nfs4_op_sequence() sets it to False and then
calls nfs4_Compound_Free(). I don't see any lock that prevents
Thread10 (nfs_dupreq_rele path) running at the same time. I am new
to this code, so I might be wrong in my analysis though!
One option is to bypass DRC for NFS4.1 and above.
Regards, Malahal.
On Fri, Oct 5, 2018 at 12:01 AM Frank Filz
<ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>> wrote:
Yea, pretty much 4.1 are not cacheable (4.1 uses the slot
cache and so has no need of the dupreq cache).
With my patch, test_stateid isn’t doing anything different
than any of the other 4.1 ops that actually free memory in
their Free routines.
Maybe they all can lead to double free? In which case
somewhere along the line we are doing something wrong with the
dup req cache for 4.1///
Frank
*From:*Malahal Naineni [mailto:malahal@gmail.com
<mailto:malahal@gmail.com>]
*Sent:* Thursday, October 4, 2018 11:16 AM
*To:* ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>
*Cc:* patrice.lucas(a)cea.fr <mailto:patrice.lucas@cea.fr>;
devel(a)lists.nfs-ganesha.org <mailto:devel@lists.nfs-ganesha.org>
*Subject:* [NFS-Ganesha-Devel] Re: double-free bug
Thread10 thinks that op_test stateid_is not cachable, so it
actually frees the response and other goodies allocated. But
thread7 finds in the slot cache and tries to free leading to a
double free. The code path has to be for minor version 1 or 2
(not zero) based on line numbers. I don't know much about 4.1
slot cache.
Regards, Malahal.
On Thu, Oct 4, 2018 at 8:52 PM Frank Filz
<ffilzlnx(a)mindspring.com <mailto:ffilzlnx@mindspring.com>> wrote:
The only thing I can think of is thata TEST_STATEID was
issued with minor version = 0 which is the only way it can
fail.
I’m going to submit a fix that checks for return status
before freeing.
A couple Free routines NULL out the values they free, but
almost all check for NFS4_OK. There are a couple others
that also don’t check. I’ll fix those too.
Frank
*From:*patrice.lucas@cea.fr <mailto:patrice.lucas@cea.fr>
[mailto:patrice.lucas@cea.fr <mailto:patrice.lucas@cea.fr>]
*Sent:* Thursday, October 4, 2018 6:43 AM
*To:* devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
*Subject:* [NFS-Ganesha-Devel] double-free bug
Hello everyone,
Frequent memory crashs have been occurring for few weeks
in the nfs-ganesha CEA FSAL-PROXY continuous integration
test. I finally make time for looking at these problems
today by running the nfs-ganesha server under Address
Sanitizer.
I got the following stack wih a double-free error. Could
anyone explain this error ? Someone who well understand
the dup-req cache ? Or someone who already works with the
code of the nfs4_op_test_stateid operation ?
The nfs4_op_test_stateid was introduce this summer by
gerrit patch 418826
<
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/418826>
from
fatih-acar
<
https://review.gerrithub.io/q/owner:fatih%2540gandi.net>,
07/22/2018.
The dup-req cache stack seems to be involved in this error.
Regards,
Patrice
==7037==ERROR: AddressSanitizer: attempting double-free on
0x60200001ced0 in thread T7:
#0 0x480c09 in __interceptor_free
(/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free
/opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0x865c4a in nfs4_op_sequence
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
#6 0x6fd80f in nfs4_Compound
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#7 0x67167c in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#8 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#9 0x7ffff7bb94a1 in svc_vc_decode
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#10 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#11 0x7ffff7bb934c in svc_vc_recv
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#12 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#13 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#14 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#15 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#16 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#17 0x7ffff6367e24 in start_thread
(/lib64/libpthread.so.0+0x7e24)
#18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
0x60200001ced0 is located 0 bytes inside of 4-byte region
[0x60200001ced0,0x60200001ced4)
freed by thread T10 here:
#0 0x480c09 in __interceptor_free
(/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free
/opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0xcec2a4 in nfs_dupreq_rele
/opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
#6 0x673196 in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
#7 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#8 0x7ffff7bb94a1 in svc_vc_decode
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#9 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#10 0x7ffff7bb934c in svc_vc_recv
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#11 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#12 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#13 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#14 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#15 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#16 0x7ffff6367e24 in start_thread
(/lib64/libpthread.so.0+0x7e24)
previously allocated by thread T10 here:
#0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
#1 0x89689a in gsh_calloc__
/opt/nfs-ganesha/src/include/abstract_mem.h:145
#2 0x895c4e in nfs4_op_test_stateid
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
#3 0x6fd80f in nfs4_Compound
/opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#4 0x67167c in nfs_rpc_process_request
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#5 0x663040 in nfs_rpc_valid_NFS
/opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#6 0x7ffff7bb94a1 in svc_vc_decode
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#7 0x6542ce in nfs_rpc_decode_request
/opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#8 0x7ffff7bb934c in svc_vc_recv
/opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#9 0x7ffff7bb47be in svc_rqst_xprt_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#10 0x7ffff7bb51af in svc_rqst_epoll_events
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#11 0x7ffff7bb4e94 in svc_rqst_epoll_loop
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#12 0x7ffff7bb38bf in svc_rqst_run_task
/opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#13 0x7ffff7bc077c in work_pool_thread
/opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#14 0x7ffff6367e24 in start_thread
(/lib64/libpthread.so.0+0x7e24)
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>
--
Patrice LUCAS
Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S
tel : +33 (0)1 69 26 47 86
e-mail : patrice.lucas(a)cea.fr