Patrice,
Could you check if V2.7.0.3 fixed this issue? I’d like to tag V2.7.1 sometime this week.
Thanks
Frank
From: Malahal Naineni [mailto:malahal@gmail.com]
Sent: Friday, October 5, 2018 12:38 AM
To: ffilzlnx@mindspring.com
Cc: patrice.lucas@cea.fr; devel@lists.nfs-ganesha.org
Subject: [NFS-Ganesha-Devel] Re: double-free bug
>> There are other requests that free things in their Free functions that should have been run under 4.1 and thus if we really are both caching something in slot cache and drc somehow, they would have tripped also.
From the backtrace line numbers, it is clear that nfs_dupreq_rele() has DUPREQ_NOCACHE. So DRC is NOT caching but only freeing.
Regards, Malahal.
On Fri, Oct 5, 2018 at 1:54 AM Frank Filz <ffilzlnx@mindspring.com> wrote:
nfs_dupreq_v4_cacheable is supposed to exclude all 4.1 requests.
I wonder if something changed. There are other requests that free things in their Free functions that should have been run under 4.1 and thus if we really are both caching something in slot cache and drc somehow, they would have tripped also.
Here’s all the places that gsh_free something in res:
Protocols/NFS/nfs4_Compound.c nfs4_Compound 962 gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1128 gsh_free(res->res_compound4.resarray.resarray_val);
Protocols/NFS/nfs4_Compound.c nfs4_Compound_Free 1131 gsh_free(res->res_compound4.tag.utf8string_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 441 gsh_free(resok->eir_server_scope.eir_server_scope_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 442 gsh_free(resok->eir_server_owner.so_major_id.so_major_id_val);
Protocols/NFS/nfs4_op_exchange_id.c nfs4_op_exchange_id_Free 443 gsh_free(resok->eir_server_impl_id.eir_server_impl_id_val);
Protocols/NFS/nfs4_op_getdeviceinfo.c nfs4_op_getdeviceinfo_Free 205 gsh_free(resok->gdir_device_addr.da_addr_body.da_addr_body_val);
Protocols/NFS/nfs4_op_getdevicelist.c nfs4_op_getdevicelist_Free 196 gsh_free(resok->gdlr_deviceid_list.gdlr_deviceid_list_val);
Protocols/NFS/nfs4_op_getfh.c nfs4_op_getfh_Free 138 gsh_free(resp->GETFH4res_u.resok4.object.nfs_fh4_val);
Protocols/NFS/nfs4_op_read.c nfs4_op_read_Free 615 gsh_free(resp->READ4res_u.resok4.data.data_val);
Protocols/NFS/nfs4_op_readlink.c nfs4_op_readlink_Free 125 gsh_free(resp->READLINK4res_u.resok4.link.utf8string_val);
Protocols/NFS/nfs4_op_secinfo.c nfs4_op_secinfo_Free 337 gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_secinfo_no_name.c nfs4_op_secinfo_no_name_Free 209 gsh_free(resp->SECINFO4res_u.resok4.SECINFO4resok_val);
Protocols/NFS/nfs4_op_setclientid.c nfs4_op_setclientid_Free 384 gsh_free(resp->SETCLIENTID4res_u.client_using.r_addr);
Protocols/NFS/nfs4_op_test_stateid.c nfs4_op_test_stateid_Free 121 gsh_free(res->tsr_status_codes.tsr_status_codes_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_getxattr_Free 142 gsh_free(res_GETXATTR4->GETXATTR4res_u.resok4.gr_value.utf8string_val);
Protocols/NFS/nfs4_op_xattr.c nfs4_op_listxattr_Free 316 gsh_free(res_LISTXATTR4->LISTXATTR4res_u.resok4.lr_names.entries);
So I’m struggling to see what is unique about test_stateid other than that it didn’t check the return code, which can only be NFS4_OK or NFS4ERR_INVAL, and only invalid if TEST_STATEID was issued with minorversion = 0.
Frank
From: Malahal Naineni [mailto:malahal@gmail.com]
Sent: Thursday, October 4, 2018 12:49 PM
To: ffilzlnx@mindspring.com
Cc: patrice.lucas@cea.fr; devel@lists.nfs-ganesha.org
Subject: Re: [NFS-Ganesha-Devel] Re: double-free bug
nfs4_Compound_Free() looks at res_cached to free the stuff or not. The code in nfs4_op_sequence() sets it to False and then calls nfs4_Compound_Free(). I don't see any lock that prevents Thread10 (nfs_dupreq_rele path) running at the same time. I am new to this code, so I might be wrong in my analysis though!
One option is to bypass DRC for NFS4.1 and above.
Regards, Malahal.
On Fri, Oct 5, 2018 at 12:01 AM Frank Filz <ffilzlnx@mindspring.com> wrote:
Yea, pretty much 4.1 are not cacheable (4.1 uses the slot cache and so has no need of the dupreq cache).
With my patch, test_stateid isn’t doing anything different than any of the other 4.1 ops that actually free memory in their Free routines.
Maybe they all can lead to double free? In which case somewhere along the line we are doing something wrong with the dup req cache for 4.1///
Frank
From: Malahal Naineni [mailto:malahal@gmail.com]
Sent: Thursday, October 4, 2018 11:16 AM
To: ffilzlnx@mindspring.com
Cc: patrice.lucas@cea.fr; devel@lists.nfs-ganesha.org
Subject: [NFS-Ganesha-Devel] Re: double-free bug
Thread10 thinks that op_test stateid_is not cachable, so it actually frees the response and other goodies allocated. But thread7 finds in the slot cache and tries to free leading to a double free. The code path has to be for minor version 1 or 2 (not zero) based on line numbers. I don't know much about 4.1 slot cache.
Regards, Malahal.
On Thu, Oct 4, 2018 at 8:52 PM Frank Filz <ffilzlnx@mindspring.com> wrote:
The only thing I can think of is thata TEST_STATEID was issued with minor version = 0 which is the only way it can fail.
I’m going to submit a fix that checks for return status before freeing.
A couple Free routines NULL out the values they free, but almost all check for NFS4_OK. There are a couple others that also don’t check. I’ll fix those too.
Frank
From: patrice.lucas@cea.fr [mailto:patrice.lucas@cea.fr]
Sent: Thursday, October 4, 2018 6:43 AM
To: devel@lists.nfs-ganesha.org
Subject: [NFS-Ganesha-Devel] double-free bug
Hello everyone,
Frequent memory crashs have been occurring for few weeks in the nfs-ganesha CEA FSAL-PROXY continuous integration test. I finally make time for looking at these problems today by running the nfs-ganesha server under Address Sanitizer.
I got the following stack wih a double-free error. Could anyone explain this error ? Someone who well understand the dup-req cache ? Or someone who already works with the code of the nfs4_op_test_stateid operation ?
The nfs4_op_test_stateid was introduce this summer by gerrit patch 418826 from
fatih-acar, 07/22/2018.
The dup-req cache stack seems to be involved in this error.
Regards,
Patrice
==7037==ERROR: AddressSanitizer: attempting double-free on 0x60200001ced0 in thread T7:
#0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free /opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0x865c4a in nfs4_op_sequence /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:185
#6 0x6fd80f in nfs4_Compound /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#7 0x67167c in nfs_rpc_process_request /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#8 0x663040 in nfs_rpc_valid_NFS /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#9 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#10 0x6542ce in nfs_rpc_decode_request /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#11 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#12 0x7ffff7bb47be in svc_rqst_xprt_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#13 0x7ffff7bb51af in svc_rqst_epoll_events /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#14 0x7ffff7bb4e94 in svc_rqst_epoll_loop /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#15 0x7ffff7bb38bf in svc_rqst_run_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#16 0x7ffff7bc077c in work_pool_thread /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#17 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
#18 0x7ffff575c34c in __clone (/lib64/libc.so.6+0xf834c)
0x60200001ced0 is located 0 bytes inside of 4-byte region [0x60200001ced0,0x60200001ced4)
freed by thread T10 here:
#0 0x480c09 in __interceptor_free (/usr/bin/ganesha.nfsd+0x480c09)
#1 0x897125 in gsh_free /opt/nfs-ganesha/src/include/abstract_mem.h:299
#2 0x896f88 in nfs4_op_test_stateid_Free /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:121
#3 0x703702 in nfs4_Compound_FreeOne /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1081
#4 0x7042c4 in nfs4_Compound_Free /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:1119
#5 0xcec2a4 in nfs_dupreq_rele /opt/nfs-ganesha/src/RPCAL/nfs_dupreq.c:1315
#6 0x673196 in nfs_rpc_process_request /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1442
#7 0x663040 in nfs_rpc_valid_NFS /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#8 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#9 0x6542ce in nfs_rpc_decode_request /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#10 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#11 0x7ffff7bb47be in svc_rqst_xprt_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#12 0x7ffff7bb51af in svc_rqst_epoll_events /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#13 0x7ffff7bb4e94 in svc_rqst_epoll_loop /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#14 0x7ffff7bb38bf in svc_rqst_run_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#15 0x7ffff7bc077c in work_pool_thread /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#16 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)
previously allocated by thread T10 here:
#0 0x480e59 in calloc (/usr/bin/ganesha.nfsd+0x480e59)
#1 0x89689a in gsh_calloc__ /opt/nfs-ganesha/src/include/abstract_mem.h:145
#2 0x895c4e in nfs4_op_test_stateid /opt/nfs-ganesha/src/Protocols/NFS/nfs4_op_test_stateid.c:88:3
#3 0x6fd80f in nfs4_Compound /opt/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:903
#4 0x67167c in nfs_rpc_process_request /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1329
#5 0x663040 in nfs_rpc_valid_NFS /opt/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1539
#6 0x7ffff7bb94a1 in svc_vc_decode /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:824
#7 0x6542ce in nfs_rpc_decode_request /opt/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#8 0x7ffff7bb934c in svc_vc_recv /opt/nfs-ganesha/src/libntirpc/src/svc_vc.c:797
#9 0x7ffff7bb47be in svc_rqst_xprt_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:767
#10 0x7ffff7bb51af in svc_rqst_epoll_events /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:939
#11 0x7ffff7bb4e94 in svc_rqst_epoll_loop /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1012:8
#12 0x7ffff7bb38bf in svc_rqst_run_task /opt/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1048:14
#13 0x7ffff7bc077c in work_pool_thread /opt/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#14 0x7ffff6367e24 in start_thread (/lib64/libpthread.so.0+0x7e24)_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
-- Patrice LUCAS Ingenieur-Chercheur, CEA-DAM/DSSI/SISR/LA2S tel : +33 (0)1 69 26 47 86 e-mail : patrice.lucas@cea.fr