It is quite possible that we could have zero refcount entries in the xprt/rpc_dplx_rec hash table. The code tries to use some atomic flags to avoid multiple threads freeing the xd/xprt but the code in svc_xprt_lookup() could end up accessing freed memory. It places a REF (SVC_REF) under the hash lock, but another process might be trying to free/destroy the xprt. The code in svc_xprt_lookup() looks for a destroying flag after releasing the hash lock and by this time there is no guaranty that the xprt is still valid. The xprt is only valid as long as it is found in the hash table and we hold the hash lock, correct?
We are able to create a crash consistently, but not sure if this code is the culprit at this point.