Issue with Ganesha not processing UDP NULL proc requests from AMD
by anirban.chalcogen@gmail.com
Hi Daniel or Frank,
We spoke of this issue a little over a week ago in our weekly calls. On our last call, we showed you the core backtrace that was obtained for Ganesha. You suggested that we might have to dig into the core dump to ascertain whether the UDP socket is being added to the set of FDs to be epoll'd. However, I'm not sure if that data is not lost in the epoll_wait() stack. Here's for example the full backtrace of a stack with epoll_wait().
(gdb) bt full
#0 0x00007f3c46616eb3 in epoll_wait () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f3c4876f04c in svc_rqst_epoll_loop (sr_rec=0x4ae8c10) at /usr/src/debug/nfs-ganesha-2.7.5-ibm056.02/libntirpc/src/svc_rqst.c:1013
cc = 0x7f3c46d13eb4 <pthread_cond_timedwait@(a)GLIBC_2.3.2+516>
n = 0x0
ts = {tv_sec = 2750901, tv_nsec = 954488941}
timeout_ms = 29000
expire_ms = -1544065341
n_events = 32572
__func__ = "svc_rqst_epoll_loop"
#2 0x00007f3c4876f199 in svc_rqst_run_task (wpe=0x4ae8c10) at /usr/src/debug/nfs-ganesha-2.7.5-ibm056.02/libntirpc/src/svc_rqst.c:1065
sr_rec = 0x4ae8c10
finished = false
__func__ = "svc_rqst_run_task"
#3 0x00007f3c48777dca in work_pool_thread (arg=0x7f3c300008c0) at /usr/src/debug/nfs-ganesha-2.7.5-ibm056.02/libntirpc/src/work_pool.c:181
wpt = 0x7f3c300008c0
pool = 0x7f3c48993940 <svc_work_pool>
have = 0x0
ts = {tv_sec = 1608803814, tv_nsec = 833467718}
rc = 0
spawn = false
__func__ = "work_pool_thread"
#4 0x00007f3c46d0fea5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5 0x00007f3c466168dd in clone () from /lib64/libc.so.6
No symbol table info available.
(gdb)
My idea was if we could trace back to the SVCXPRT instance from the *sr_rec because xp_type member in the former has the type of the transport channel. However, if you see how we obtain the sr_rec form the SVCXPRT,
int
svc_rqst_rearm_events(SVCXPRT *xprt)
{
struct rpc_dplx_rec *rec = REC_XPRT(xprt);
struct svc_rqst_rec *sr_rec = (struct svc_rqst_rec *)rec->ev_p;
int code = EINVAL;
Here, the ev_p is not being referenced so we cannot use a container_of() kind macro to easily obtain the rec structure and trace outwards. Is there another way we could determine the transport channel from the core backtrace?
Thanks,
Anirban