You're right, I'd missed that. So the only issues left are keeping
nsm_count correct, and making sure that statd behaves correctly when we
disconnect and reconnect.
I think I'd prefer the version with a force argument for the error
cases. That way, under normal circumstances, things stay the same.
Daniel
On 10/25/2018 12:00 AM, Malahal Naineni wrote:
nsm_monitor and nsm_unmonitor use a single lock to serialize
everything.
I am not sure what problem would it cause.
On Wed, Oct 24, 2018 at 7:25 PM Daniel Gryniewicz <dang(a)redhat.com
<mailto:dang@redhat.com>> wrote:
Ah, okay, I see now. The problem is, I'm not sure how
force-disconnect interacts with nsm_count. I don't think we can just
disregard it. In fact, looking at it, we can't force delete nsm_clnt,
because another nsm_monitor() call might be using it. Maybe we need a
separate client for each nsm_monitor() call? I don't know if that
works with the lock manager, though.
Frank, any ideas?
Daniel
On Wed, Oct 24, 2018 at 4:22 AM Malahal Naineni <malahal(a)gmail.com
<mailto:malahal@gmail.com>> wrote:
>
> Dan, nsm_disconnect() is disconnecting every time as opposed to
only when nsm_count==0. I think this works because it is connecting
and disconnecting everything, not just on error. We could pass a
flag to nsm_disconnect(force=true) on error and
nsm_disconnect(force=false) with normal case. We need this as a way
to recovery mechanism.
>
> Regards, Malahal.
> PS: I just looked at this nsm code for a customer issue I posted
on IRC yesterday! Thank you Suhrud.
>
> On Tue, Oct 23, 2018 at 7:55 PM Daniel Gryniewicz
<dang(a)redhat.com <mailto:dang@redhat.com>> wrote:
>>
>> Discussion inline.
>>
>> On 10/23/2018 07:58 AM, Suhrud Patankar wrote:
>> > Hello All,
>> >
>> > I have two questions related to NLM and rpc.statd.
>> >
>> > Setup:
>> > Ganesha version: V2.5.0.4
>> > CentOS Linux release 7.5.1804
>> > FSAL – Any
>> >
>> > 1. By default libntirpc has -DPORTMAP build flag enabled. As I
>> > understand this will use UDP to communicate with rpc.statd.
With this
>> > Ganesha fails to monitor the NLM clients and hence lock
requests fail.
>> > The call to clnt_tp_ncreate_timed() fails in
>> > clnt_dg_control(CLSET_SVC_ADDR) as addr->len is 16 where as
>> > sizeof(cu->cu_raddr) is 128.
>> > If I disable -DPORTMAP build flag in tirpc, then Ganesha uses
“tcp” to
>> > connect to rpc.statd and everything works as expected.
>> > Are we supposed to use “tcp” and disable the -DPORTMAP flag?
Why is
>> > this flag enabled by default?
>>
>> This isn't related to UDP vs. TCP. The issue is that one
address buffer
>> passed in is IPv4, and the other is IPv6, and an IPv6 address
cannot fit
>> into an IPv4 buffer. It's definitely a bug, but I don't see
exactly how
>> this is happening. Can you give a more detailed explanation, with
>> callpaths to get there?
>>
>> >
>> > 2. Case: rpc.statd service restarts. Ganesha is not restarted.
>> > nsm_connect() is done only once. On failure to connect to
rpc.statd,
>> > we don’t retry. This causes the NLM lock request to fail with “No
>> > lock” error.
>> > Is it expected that Ganesha will restart when rpc.statd
restarts? I
>> > added a patch to re-call nsm_connect() on failure and it works
for me
>> > now.
>> >
>> > Is this correct way to handle this?
>>
>> We should probably re-connect when statd fails, but I'm not sure how
>> this patch accomplishes that. All it does is retry if the initial
>> connect fails, it doesn't seem to retry later on. Can you
explain more
>> how this is working?
>>
>> Daniel
>>
>> > The patch is as below.
>> > ------------------------------
>> > From: Suhrud Patankar <suhrudpatankar(a)gmail.com
<mailto:suhrudpatankar@gmail.com>>
>> > Date: Thu, 11 Oct 2018 00:35:45 -0700
>> > Subject: [PATCH] NFSv3: NLM - Reconnect with statd on failure
>> >
>> > Ganesha creates the statd connection context only once.
>> > In case statd service restarts, we fail to monitor new clients.
>> > On failure, destroy the STATD connection and recreate new one.
>> > ---
>> > src/Protocols/NLM/nsm.c | 21 +++++++++++++++++++--
>> >
>> > 1 files changed, 19 insertions(+), 2 deletions(-)
>> >
>> >
>> > diff --git a/src/Protocols/NLM/nsm.c b/src/Protocols/NLM/nsm.c
>> >
>> > index 09909d9..8bb41d7 100644
>> > --- a/src/Protocols/NLM/nsm.c
>> > +++ b/src/Protocols/NLM/nsm.c
>> > @@ -47,6 +47,8 @@ bool nsm_connect(void)
>> > if (nsm_clnt != NULL)
>> > return true;
>> >
>> > + LogFullDebug(COMPONENT_NLM, "Creating new nsm
connection ");
>> > +
>> > nsm_clnt = clnt_ncreate("localhost", SM_PROG,
SM_VERS, "tcp");
>> >
>> > if (CLNT_FAILURE(nsm_clnt)) {
>> > @@ -66,7 +68,7 @@ bool nsm_connect(void)
>> >
>> > void nsm_disconnect(void)
>> > {
>> > - if (nsm_count == 0 && nsm_clnt != NULL) {
>> > + if (nsm_clnt != NULL) {
>> > CLNT_DESTROY(nsm_clnt);
>> > nsm_clnt = NULL;
>> > AUTH_DESTROY(nsm_auth);
>> > @@ -83,6 +85,7 @@ bool nsm_monitor(state_nsm_client_t *host)
>> > enum clnt_stat ret;
>> > char client_ip[SOCK_NAME_MAX];
>> > char server_ip[SOCK_NAME_MAX * 2]; /* space for
proxy_ip */
>> > + bool retry = false;
>> >
>> > if (host == NULL)
>> > return true;
>> > @@ -120,7 +123,7 @@ bool nsm_monitor(state_nsm_client_t *host)
>> > LogDebug(COMPONENT_NLM, "Monitor %s",
host->ssc_nlm_caller_name);
>> >
>> > PTHREAD_MUTEX_lock(&nsm_mutex);
>> > -
>> > +again:
>> > /* create a connection to nsm on the localhost */
>> > if (!nsm_connect()) {
>> > LogCrit(COMPONENT_NLM,
>> > @@ -144,6 +147,20 @@ bool nsm_monitor(state_nsm_client_t *host)
>> > }
>> >
>> > if (ret != RPC_SUCCESS) {
>> > + if (! retry) {
>> > + /* Handle case when statd restarts.
>> > + * Need to reconnect with statd
>> > + */
>> > + LogFullDebug(COMPONENT_NLM,
>> > + "Try to reconnect with
STATD");
>> > +
>> > + retry = true;
>> > + nsm_mon.mon_id.my_id.my_name = NULL;
>> > + clnt_req_release(cc);
>> > + nsm_disconnect();
>> > + goto again;
>> > + }
>> > +
>> > t = rpc_sperror(&cc->cc_error,
"failed");
>> > LogCrit(COMPONENT_NLM,
>> > "Monitor %s SM_MON %s",
>> > --
>> > 1.7.1
>> >
>> > Thanks & Regards,
>> > Suhrud
>> > _______________________________________________
>> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
>> > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>
>> >
>> _______________________________________________
>> Devel mailing list -- devel(a)lists.nfs-ganesha.org
<mailto:devel@lists.nfs-ganesha.org>
>> To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
<mailto:devel-leave@lists.nfs-ganesha.org>