Peter Schwenke has uploaded this change for review.

View Change

NLM: Fix NFS rpc.statd-related hang

Fixes Issue #680

This implements Frank Filz's suggestion in comment
https://github.com/nfs-ganesha/nfs-ganesha/issues/680#issuecomment-2664319185

When the SM_MON and SM_UNMON RPC calls were made to statd
and the threads were exhausted, nfs-ganesha was hanging
on the ssc_mutex lock.

ssc_monitored has been changed from a boolean to an enum
{UNMONITORED, ATTEMPTING, MONITORED}

nsm_monitor_noretry()/nsm_monitor_noretry() first check
ssc_monitored to see if a monitor/unmonitor is being
attempted. If so, we sleep for 1 sec and retry up to 3
times.

Then we check if scc_monitored is already in the state we
want i.e. UNMONITORED/MONITORED. If so, we bail.

Otherwise, we set ssc_monitored to ATTEMPTING and try
the RPC call.

On RPC success, we flip ssc_monitored to UNMONITORED/MONITORED.
On error, we set it back to what it would have been i.e.
MONITORED/UNMONITORED

Change-Id: I0f0814842d806132e6e4441fe0956236e5d914c5
Signed-off-by: Peter Schwenke <pschwenke@ddn.com>
---
M src/Protocols/NLM/nsm.c
M src/SAL/nlm_owner.c
M src/include/sal_data.h
3 files changed, 56 insertions(+), 19 deletions(-)

git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/44/1214144/1

To view, visit change 1214144. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: newchange
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-Change-Id: I0f0814842d806132e6e4441fe0956236e5d914c5
Gerrit-Change-Number: 1214144
Gerrit-PatchSet: 1
Gerrit-Owner: Peter Schwenke <pschwenke@ddn.com>