And, I think I have it. It seems there are rather simple ways to sabotage the UDP link in
Ganesha 2.7 by sending spurious messages. Look at the following:
static enum xprt_stat
svc_dg_rendezvous(SVCXPRT *xprt)
{
/* ... */
rlen = recvmsg(newxprt->xp_fd, mesgp, 0);
if (sp->sa_family == (sa_family_t) 0xffff) {
svc_dg_xprt_free(su);
return (XPRT_DIED);
}
if (rlen == -1 && errno == EINTR)
goto again;
if (rlen == -1 || (rlen < (ssize_t) (4 * sizeof(u_int32_t)))) {
svc_dg_xprt_free(su);
return (XPRT_DIED);
}
if (unlikely(svc_rqst_rearm_events(xprt))) {
I was able to reproduce the issue simply by sending an UDP message less than 16 bytes
long. Also, to note, the svc_dg_xprt_free() does not actually close the UDP fd, maybe
because it is still added to an epoll_fd() and close() returns an error (guessing,
haven't verified yet).
IMO, if we get a spurious message, especially given that it's UDP, we should simply
ignore it and move on rather than tear down the whole channel?
Thanks,
Anirban
P.s. In 2.5.3, from what I understand, we don't actually do an XPRT_DIED and even for
messages < 16 bytes, we always rearm afterwards which is why customer only saw the
problem after upgrading to 2.7.