We saw a crash in version 2.8.2 running an iozone crash:
Program terminated with signal 11, Segmentation fault.
#0 0x00007ff028dce240 in atomic_sub_uint32_t (var=0x104, sub=1) at
/usr/src/debug/nfs-ganesha-2.8.2/include/abstract_atomic.h:384
384 return __atomic_sub_fetch(var, sub, __ATOMIC_SEQ_CST);
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-12.el7.x86_64
dbus-libs-1.11.16-1.fc26.x86_64 elfutils-libelf-0.158-3.el7.x86_64
elfutils-libs-0.158-3.el7.x86_64 glibc-2.17-55.el7.x86_64 gssproxy-0.3.0-9.el7.x86_64
keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libacl-2.2.51-12.el7.x86_64
libattr-2.4.46-12.el7.x86_64 libblkid-2.23.2-16.el7.x86_64 libcap-2.22-8.el7.x86_64
libcom_err-1.42.9-10.el7.x86_64 libgcrypt-1.5.3-4.el7.x86_64
libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-9.el7.x86_64 libselinux-2.5-11.el7.x86_64
libuuid-2.23.2-16.el7.x86_64 libwbclient-4.8.3-4.el7.x86_64 pcre-8.32-12.el7.x86_64
samba-client-libs-4.8.3-4.el7.x86_64 samba-winbind-modules-4.8.3-4.el7.x86_64
sssd-client-1.11.2-65.el7.x86_64 systemd-libs-219-42.el7_4.1.x86_64
userspace-rcu-0.7.16-1.el7.x86_64 xz-libs-5.1.2-8alpha.el7.x86_64
zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0 0x00007ff028dce240 in atomic_sub_uint32_t (var=0x104, sub=1) at
/usr/src/debug/nfs-ganesha-2.8.2/include/abstract_atomic.h:384
#1 0x00007ff028dce265 in atomic_dec_uint32_t (var=0x104) at
/usr/src/debug/nfs-ganesha-2.8.2/include/abstract_atomic.h:405
#2 0x00007ff028dd1b65 in dupreq_entry_put (dv=0x0) at
/usr/src/debug/nfs-ganesha-2.8.2/RPCAL/nfs_dupreq.c:912
#3 0x00007ff028dd4630 in nfs_dupreq_rele (req=0x7fec44a269f0, func=0x7ff029159d60
<nfs3_func_desc+672>) at /usr/src/debug/nfs-ganesha-2.8.2/RPCAL/nfs_dupreq.c:1382
#4 0x00007ff028d74bc5 in free_args (reqdata=0x7fec44a269f0) at
/usr/src/debug/nfs-ganesha-2.8.2/MainNFSD/nfs_worker_thread.c:693
#5 0x00007ff028d77783 in nfs_rpc_process_request (reqdata=0x7fec44a269f0) at
/usr/src/debug/nfs-ganesha-2.8.2/MainNFSD/nfs_worker_thread.c:1509
#6 0x00007ff028d77a74 in nfs_rpc_valid_NFS (req=0x7fec44a269f0) at
/usr/src/debug/nfs-ganesha-2.8.2/MainNFSD/nfs_worker_thread.c:1601
#7 0x00007ff028b2632d in svc_vc_decode (req=0x7fec44a269f0) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/svc_vc.c:829
#8 0x00007ff028b227bf in svc_request (xprt=0x7fee0c202ad0, xdrs=0x7fec3c987a70) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/svc_rqst.c:793
#9 0x00007ff028b2623e in svc_vc_recv (xprt=0x7fee0c202ad0) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/svc_vc.c:802
#10 0x00007ff028b22740 in svc_rqst_xprt_task (wpe=0x7fee0c202cf0) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/svc_rqst.c:774
#11 0x00007ff028b23048 in svc_rqst_epoll_loop (wpe=0x108d570) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/svc_rqst.c:1089
#12 0x00007ff028b2baff in work_pool_thread (arg=0x7fed44052070) at
/usr/src/debug/nfs-ganesha-2.8.2/libntirpc/src/work_pool.c:184
#13 0x00007ff0270c8df3 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ff0267cd3dd in clone () from /lib64/libc.so.6
(gdb) frame 5
#5 0x00007ff028d77783 in nfs_rpc_process_request (reqdata=0x7fec44a269f0) at
/usr/src/debug/nfs-ganesha-2.8.2/MainNFSD/nfs_worker_thread.c:1509
1509 free_args(reqdata);
(gdb) p *reqdata
$2 = {
svc = {
rq_xprt = 0x7fee0c202ad0,
rq_clntname = 0x0,
rq_svcname = 0x0,
rq_xdrs = 0x7fec3c987a70,
rq_u1 = 0x0,
rq_u2 = 0x0,
rq_cksum = 14781944753697519387,
rq_auth = 0x7ff028d45cd0 <svc_auth_none>,
rq_ap1 = 0x0,
rq_ap2 = 0x0,
...
Notice rq_u1 == 0. nfs_rpc_process_request is calling free_args which is calling
nfs_dupreq_rele. There is a comment at the beginning of nfs_dupreq_rele:
* We assert req->rq_u1 now points to the corresponding duplicate request
* cache entry (dv).
But this assumption doesn't seem to be true, and eventually this will be dereferenced
leading to a crash. We're trying to reproduce the problem with NIV_FULL_DEBUG.
Has anyone seen this problem before, or know what would cause rq_u1 to be 0? Should there
be a check for NULL in nfs_dupreq_rele in addition to DUPREQ_NOCACHE?
Bill