Peng Xie has uploaded this change for review.
fix nfs client IO hang during heavy workload
the hang was introduced in the following case:
1. nfs client send IO-1 to the nfs ganesha, where it first goes
into the drc and makerd DUPREQ_START.
2. the IO-1 stucked and being processed very slow in the ganesha
server due to heavy io workload and unfortunately, the nfs client
and ganesha server experienced network timedout and nfs client
issued reconnect then retry IO-1
3. the step-2's retry IO-1 enters into the ganesha drc and found
the same xid request is BEING_PROCESSED, so it will do nothing
and directly returned under the assumption that after step-1 's
IO-1 completed, it will reply to the nfs client.
4. after step-1 's IO-1 completion, when trying to reply the client,
it found the XPRT_DESTROYED due to the previous nfs client reconnect,
and the reply will be dropped.
Finally, the nfs client will forever hang and ganesha server will never
reply the IO.
Change-Id: Ic917450afc09a0090ce3de42ed464f168d1623a8
Signed-off-by: Peng Xie <peng.hse@xtaotech.com>
---
M src/MainNFSD/nfs_worker_thread.c
M src/RPCAL/nfs_dupreq.c
M src/include/nfs_dupreq.h
3 files changed, 29 insertions(+), 27 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/11/494411/1
To view, visit change 494411. To unsubscribe, or for help writing mail filters, visit settings.