I have two questions related to NLM and rpc.statd.
Setup:
Ganesha version: V2.5.0.4
CentOS Linux release 7.5.1804
FSAL – Any
1. By default libntirpc has -DPORTMAP build flag enabled. As I understand this
will use UDP to communicate with rpc.statd. With this Ganesha fails to monitor
the NLM clients and hence lock requests fail.
The call to clnt_tp_ncreate_timed() fails in
clnt_dg_control(CLSET_SVC_ADDR) as addr->len is 16 where as
sizeof(cu->cu_raddr) is 128.
If I disable -DPORTMAP build flag in tirpc, then Ganesha uses “tcp” to connect to
rpc.statd and everything works as expected.
Are we supposed to use “tcp” and disable the -DPORTMAP flag? Why is this flag
enabled by default?
That's odd that a flag named PORTMAP would change from using tcp to udp...
2. Case: rpc.statd service restarts. Ganesha is not restarted.
nsm_connect() is done only once. On failure to connect to rpc.statd, we don’t
retry. This causes the NLM lock request to fail with “No lock” error.
Is it expected that Ganesha will restart when rpc.statd restarts? I added a patch
to re-call nsm_connect() on failure and it works for me now.
Hmm, I need to look into what the implications of rpc.statd restarting are. If it at all
looks like a server reboot then we should do the appropriate thing inside Ganesha. If
it's transparent, then yes, recreating the connection is good.
Frank
Is this correct way to handle this?
The patch is as below.
------------------------------
From: Suhrud Patankar <suhrudpatankar(a)gmail.com>
Date: Thu, 11 Oct 2018 00:35:45 -0700
Subject: [PATCH] NFSv3: NLM - Reconnect with statd on failure
Ganesha creates the statd connection context only once.
In case statd service restarts, we fail to monitor new clients.
On failure, destroy the STATD connection and recreate new one.
---
src/Protocols/NLM/nsm.c | 21 +++++++++++++++++++--
1 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/Protocols/NLM/nsm.c b/src/Protocols/NLM/nsm.c
index 09909d9..8bb41d7 100644
--- a/src/Protocols/NLM/nsm.c
+++ b/src/Protocols/NLM/nsm.c
@@ -47,6 +47,8 @@ bool nsm_connect(void)
if (nsm_clnt != NULL)
return true;
+ LogFullDebug(COMPONENT_NLM, "Creating new nsm connection ");
+
nsm_clnt = clnt_ncreate("localhost", SM_PROG, SM_VERS,
"tcp");
if (CLNT_FAILURE(nsm_clnt)) {
@@ -66,7 +68,7 @@ bool nsm_connect(void)
void nsm_disconnect(void)
{
- if (nsm_count == 0 && nsm_clnt != NULL) {
+ if (nsm_clnt != NULL) {
CLNT_DESTROY(nsm_clnt);
nsm_clnt = NULL;
AUTH_DESTROY(nsm_auth);
@@ -83,6 +85,7 @@ bool nsm_monitor(state_nsm_client_t *host)
enum clnt_stat ret;
char client_ip[SOCK_NAME_MAX];
char server_ip[SOCK_NAME_MAX * 2]; /* space for proxy_ip */
+ bool retry = false;
if (host == NULL)
return true;
@@ -120,7 +123,7 @@ bool nsm_monitor(state_nsm_client_t *host)
LogDebug(COMPONENT_NLM, "Monitor %s", host->ssc_nlm_caller_name);
PTHREAD_MUTEX_lock(&nsm_mutex);
-
+again:
/* create a connection to nsm on the localhost */
if (!nsm_connect()) {
LogCrit(COMPONENT_NLM,
@@ -144,6 +147,20 @@ bool nsm_monitor(state_nsm_client_t *host)
}
if (ret != RPC_SUCCESS) {
+ if (! retry) {
+ /* Handle case when statd restarts.
+ * Need to reconnect with statd
+ */
+ LogFullDebug(COMPONENT_NLM,
+ "Try to reconnect with STATD");
+
+ retry = true;
+ nsm_mon.mon_id.my_id.my_name = NULL;
+ clnt_req_release(cc);
+ nsm_disconnect();
+ goto again;
+ }
+
t = rpc_sperror(&cc->cc_error, "failed");
LogCrit(COMPONENT_NLM,
"Monitor %s SM_MON %s",
--
1.7.1
Thanks & Regards,
Suhrud
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org