On Tue, Oct 23, 2018 at 7:55 PM Daniel Gryniewicz <dang(a)redhat.com> wrote:
Discussion inline.
On 10/23/2018 07:58 AM, Suhrud Patankar wrote:
> Hello All,
>
> I have two questions related to NLM and rpc.statd.
>
> Setup:
> Ganesha version: V2.5.0.4
> CentOS Linux release 7.5.1804
> FSAL – Any
>
> 1. By default libntirpc has -DPORTMAP build flag enabled. As I
> understand this will use UDP to communicate with rpc.statd. With this
> Ganesha fails to monitor the NLM clients and hence lock requests fail.
> The call to clnt_tp_ncreate_timed() fails in
> clnt_dg_control(CLSET_SVC_ADDR) as addr->len is 16 where as
> sizeof(cu->cu_raddr) is 128.
> If I disable -DPORTMAP build flag in tirpc, then Ganesha uses “tcp” to
> connect to rpc.statd and everything works as expected.
> Are we supposed to use “tcp” and disable the -DPORTMAP flag? Why is
> this flag enabled by default?
This isn't related to UDP vs. TCP. The issue is that one address buffer
passed in is IPv4, and the other is IPv6, and an IPv6 address cannot fit
into an IPv4 buffer. It's definitely a bug, but I don't see exactly how
this is happening. Can you give a more detailed explanation, with
callpaths to get there?
nsm_connect() calls clnt_ncreate(“localhost”, SM_PROG, SM_VERS, “tcp”)
which calls clnt_ncreate_timed() and then clnt_tp_ncreate_timed()
which calls __rpcb_findaddr_timed() to get the addr
__rpcb_findaddr_timed() if PORTMAP is defined then changes the “tcp” to “udp”
It has comments as “Try UDP only - there are some portmappers out …“.
clnt_tp_ncreate_timed() uses CLNT_CONTROL and tries to set the
CLSET_SVC_ADDR as given by __rpcb_findaddr_timed()
This is where we hit the problem.
CLNT_CONTROL is clnt_dg_control() Which has the check for the addr len.
if (addr->len < sizeof(cu->cu_raddr))
The addr is IPv4 but cu->cu_raddr is IPv6.
We want to copy the addr->buf to cu->cu_raddr but it is failing even
if addr->len is less than sizeof(cu->cu_raddr).
The if() above is buggy.
If I disable PORTMAP then tirpc uses TCP to connect to statd and does
not set the addr->len as per IPV4. So the code goes through correctly.
> >
> > 2. Case: rpc.statd service restarts. Ganesha is not restarted.
> > nsm_connect() is done only once. On failure to connect to rpc.statd,
> > we don’t retry. This causes the NLM lock request to fail with “No
> > lock” error.
> > Is it expected that Ganesha will restart when rpc.statd restarts? I
> > added a patch to re-call nsm_connect() on failure and it works for me
> > now.
> >
> > Is this correct way to handle this?
>
> We should probably re-connect when statd fails, but I'm not sure how
> this patch accomplishes that. All it does is retry if the initial
> connect fails, it doesn't seem to retry later on. Can you explain more
> how this is working?
>
> Daniel
>
> > The patch is as below.
> > ------------------------------
> > From: Suhrud Patankar <suhrudpatankar(a)gmail.com>
> > Date: Thu, 11 Oct 2018 00:35:45 -0700
> > Subject: [PATCH] NFSv3: NLM - Reconnect with statd on failure
> >
> > Ganesha creates the statd connection context only once.
> > In case statd service restarts, we fail to monitor new clients.
> > On failure, destroy the STATD connection and recreate new one.
> > ---
> > src/Protocols/NLM/nsm.c | 21 +++++++++++++++++++--
> >
> > 1 files changed, 19 insertions(+), 2 deletions(-)
> >
> >
> > diff --git a/src/Protocols/NLM/nsm.c b/src/Protocols/NLM/nsm.c
> >
> > index 09909d9..8bb41d7 100644
> > --- a/src/Protocols/NLM/nsm.c
> > +++ b/src/Protocols/NLM/nsm.c
> > @@ -47,6 +47,8 @@ bool nsm_connect(void)
> > if (nsm_clnt != NULL)
> > return true;
> >
> > + LogFullDebug(COMPONENT_NLM, "Creating new nsm connection ");
> > +
> > nsm_clnt = clnt_ncreate("localhost", SM_PROG, SM_VERS,
"tcp");
> >
> > if (CLNT_FAILURE(nsm_clnt)) {
> > @@ -66,7 +68,7 @@ bool nsm_connect(void)
> >
> > void nsm_disconnect(void)
> > {
> > - if (nsm_count == 0 && nsm_clnt != NULL) {
> > + if (nsm_clnt != NULL) {
> > CLNT_DESTROY(nsm_clnt);
> > nsm_clnt = NULL;
> > AUTH_DESTROY(nsm_auth);
> > @@ -83,6 +85,7 @@ bool nsm_monitor(state_nsm_client_t *host)
> > enum clnt_stat ret;
> > char client_ip[SOCK_NAME_MAX];
> > char server_ip[SOCK_NAME_MAX * 2]; /* space for proxy_ip */
> > + bool retry = false;
> >
> > if (host == NULL)
> > return true;
> > @@ -120,7 +123,7 @@ bool nsm_monitor(state_nsm_client_t *host)
> > LogDebug(COMPONENT_NLM, "Monitor %s",
host->ssc_nlm_caller_name);
> >
> > PTHREAD_MUTEX_lock(&nsm_mutex);
> > -
> > +again:
> > /* create a connection to nsm on the localhost */
> > if (!nsm_connect()) {
> > LogCrit(COMPONENT_NLM,
> > @@ -144,6 +147,20 @@ bool nsm_monitor(state_nsm_client_t *host)
> > }
> >
> > if (ret != RPC_SUCCESS) {
> > + if (! retry) {
> > + /* Handle case when statd restarts.
> > + * Need to reconnect with statd
> > + */
> > + LogFullDebug(COMPONENT_NLM,
> > + "Try to reconnect with STATD");
> > +
> > + retry = true;
> > + nsm_mon.mon_id.my_id.my_name = NULL;
> > + clnt_req_release(cc);
> > + nsm_disconnect();
> > + goto again;
> > + }
> > +
> > t = rpc_sperror(&cc->cc_error, "failed");
> > LogCrit(COMPONENT_NLM,
> > "Monitor %s SM_MON %s",
> > --
> > 1.7.1
> >
> > Thanks & Regards,
> > Suhrud
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
> >
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org