I haven't seen this before, but a quick check looks like a
use-after-free on the dupreq. That value is not valid for an opcode,
and that field in nfs_resop4 is an enum, so the compiler should warn if
we put trash into it. So, my guess is that the dupreq was freed but not
removed, and the memory re-allocated and used for something else.
Daniel
On 1/2/20 7:41 AM, Madhu P Punjabi wrote:
Hi All,
A customer reported a crash in nfs4_Compound_FreeOne(..). The same crash
was seen multiple times. It is happening because
"optabv4[opcode].free_res(res);" is used with "opcode=201845504"
in nfs4_Compound_FreeOne(..).
Has anybody seen this kind of crash ? Customer is using ganesha 2.5.3
with some patches from 2.7 code. Any patches that may help to fix the
crash ?
/(gdb) bt
#0 0x00007f6145c3559b in raise () from /lib64/libpthread.so.0
#1 0x0000000000455a64 in crash_handler (signo=11, info=0x7f60f1f71870,
ctx=0x7f60f1f71740) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/MainNFSD/nfs_init.c:225
#2 <signal handler called>
#3 0x0000000000461a4f in nfs4_Compound_FreeOne (res=0x7f5d0c00b750) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:873
#4 0x0000000000461b9c in nfs4_Compound_Free (res=0x7f5ce435bde0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:912
#5 0x00000000004f374e in nfs_dupreq_free_dupreq (dv=0x7f5ce42ceda0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/RPCAL/nfs_dupreq.c:882
#6 0x00000000004f38cd in dupreq_entry_put (dv=0x7f5ce42ceda0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/RPCAL/nfs_dupreq.c:913
#7 0x00000000004f5369 in nfs_dupreq_finish (req=0x7f5d000e6b18,
res_nfs=0x7f5e640084b0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/RPCAL/nfs_dupreq.c:1270
#8 0x000000000044d5ac in nfs_rpc_execute (reqdata=0x7f5d000e6af0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1375
#9 0x000000000044dbc6 in worker_run (ctx=0x4275f90) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1593
#10 0x000000000050c1ab in fridgethr_start_routine (arg=0x4275f90) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/support/fridgethr.c:550
#11 0x00007f6145c2de25 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f61452ecbad in clone () from /lib64/libc.so.6/
/(gdb) frame 3
#3 0x0000000000461a4f in nfs4_Compound_FreeOne (res=0x7f5d0c00b750) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:873
873 optabv4[opcode].free_res(res);
*(gdb) p opcode
$47 = 201845504*
(gdb) list
871 opcode = (res->resop != NFS4_OP_ILLEGAL)
872 ? res->resop : 0; /* opcode 0 for illegals */
873 optabv4[opcode].free_res(res);
(gdb) frame 4
#4 0x0000000000461b9c in nfs4_Compound_Free (res=0x7f5ce435bde0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/Protocols/NFS/nfs4_Compound.c:912
912 nfs4_Compound_FreeOne(val);
(gdb) list -
905 for (i = 0; i <
res->res_compound4.resarray.resarray_len; i++) {
906 nfs_resop4 *val =
&res->res_compound4.resarray.resarray_val[i];
907
908 if (val) {
909 /* !val is an error case, but it can
occur, so avoid
910 * indirect on NULL
911 */
*(gdb) p i <-- when crash was
seen res->res_compound4.resarray.**resarray_val[1] was being handled
$50 = 1*
*(gdb) p res->res_compound4.resarray.resarray_val[0].resop <-- the
opcode for previous operation was 0, this doesn't match with the opcode
seen in the req argument, does this look strange ?
$51 = 0*/
/(gdb) frame 8
#8 0x000000000044d5ac in nfs_rpc_execute (reqdata=0x7f5d000e6af0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm036.16-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1375
*1375 dpq_status =
nfs_dupreq_finish(&reqdata->r_u.req.svc, res_nfs);*
(*gdb) p arg_nfs->arg_compound4.argarray.argarray_val[0].argop
$52 = NFS4_OP_PUTFH*
*(gdb) p arg_nfs->arg_compound4.argarray.argarray_val[1].argop
$53 = NFS4_OP_GETATTR*/
Thanks,
Madhu Thorat.