Ah, okay, I see now. The problem is, I'm not sure how
force-disconnect
interacts with nsm_count. I don't think we can just disregard it. In fact, looking
at it, we can't force delete nsm_clnt, because another nsm_monitor() call might
be using it. Maybe we need a separate client for each nsm_monitor() call? I
don't know if that works with the lock manager, though.
Frank, any ideas?
Hmm, that should work. How expensive is it to do the connect/disconnect for each
nsm_monitor call?
Man, if we didn't have rpc.statd as a separate process... Ganesha would just write the
records and receive the SM_NOTIFY from clients. I don't know if we have to worry about
other consumers of rpc.statd at all, if so, we just need to receive SM_MON and SM_UNMON
from localhost and make the callback RPC call if a monitored client sends SM_NOTIFY.
Meanwhile our monitoring would all be handled by function calls and no reliance on RPC.
Frank
Daniel
On Wed, Oct 24, 2018 at 4:22 AM Malahal Naineni <malahal(a)gmail.com>
wrote:
>
> Dan, nsm_disconnect() is disconnecting every time as opposed to only when
nsm_count==0. I think this works because it is connecting and disconnecting
everything, not just on error. We could pass a flag to
nsm_disconnect(force=true) on error and nsm_disconnect(force=false) with
normal case. We need this as a way to recovery mechanism.
>
> Regards, Malahal.
> PS: I just looked at this nsm code for a customer issue I posted on IRC
yesterday! Thank you Suhrud.
>
> On Tue, Oct 23, 2018 at 7:55 PM Daniel Gryniewicz <dang(a)redhat.com>
wrote:
>>
>> Discussion inline.
>>
>> On 10/23/2018 07:58 AM, Suhrud Patankar wrote:
>> > Hello All,
>> >
>> > I have two questions related to NLM and rpc.statd.
>> >
>> > Setup:
>> > Ganesha version: V2.5.0.4
>> > CentOS Linux release 7.5.1804
>> > FSAL – Any
>> >
>> > 1. By default libntirpc has -DPORTMAP build flag enabled. As I
>> > understand this will use UDP to communicate with rpc.statd. With
>> > this Ganesha fails to monitor the NLM clients and hence lock requests
fail.
>> > The call to clnt_tp_ncreate_timed() fails in
>> > clnt_dg_control(CLSET_SVC_ADDR) as addr->len is 16 where as
>> > sizeof(cu->cu_raddr) is 128.
>> > If I disable -DPORTMAP build flag in tirpc, then Ganesha uses “tcp”
>> > to connect to rpc.statd and everything works as expected.
>> > Are we supposed to use “tcp” and disable the -DPORTMAP flag? Why is
>> > this flag enabled by default?
>>
>> This isn't related to UDP vs. TCP. The issue is that one address
>> buffer passed in is IPv4, and the other is IPv6, and an IPv6 address
>> cannot fit into an IPv4 buffer. It's definitely a bug, but I don't
>> see exactly how this is happening. Can you give a more detailed
>> explanation, with callpaths to get there?
>>
>> >
>> > 2. Case: rpc.statd service restarts. Ganesha is not restarted.
>> > nsm_connect() is done only once. On failure to connect to
>> > rpc.statd, we don’t retry. This causes the NLM lock request to fail
>> > with “No lock” error.
>> > Is it expected that Ganesha will restart when rpc.statd restarts? I
>> > added a patch to re-call nsm_connect() on failure and it works for
>> > me now.
>> >
>> > Is this correct way to handle this?
>>
>> We should probably re-connect when statd fails, but I'm not sure how
>> this patch accomplishes that. All it does is retry if the initial
>> connect fails, it doesn't seem to retry later on. Can you explain
>> more how this is working?
>>
>> Daniel
>>
>> > The patch is as below.
>> > ------------------------------
>> > From: Suhrud Patankar <suhrudpatankar(a)gmail.com>
>> > Date: Thu, 11 Oct 2018 00:35:45 -0700
>> > Subject: [PATCH] NFSv3: NLM - Reconnect with statd on failure
>> >
>> > Ganesha creates the statd connection context only once.
>> > In case statd service restarts, we fail to monitor new clients.
>> > On failure, destroy the STATD connection and recreate new one.
>> > ---
>> > src/Protocols/NLM/nsm.c | 21 +++++++++++++++++++--
>> >
>> > 1 files changed, 19 insertions(+), 2 deletions(-)
>> >
>> >
>> > diff --git a/src/Protocols/NLM/nsm.c b/src/Protocols/NLM/nsm.c
>> >
>> > index 09909d9..8bb41d7 100644
>> > --- a/src/Protocols/NLM/nsm.c
>> > +++ b/src/Protocols/NLM/nsm.c
>> > @@ -47,6 +47,8 @@ bool nsm_connect(void)
>> > if (nsm_clnt != NULL)
>> > return true;
>> >
>> > + LogFullDebug(COMPONENT_NLM, "Creating new nsm connection
>> > + ");
>> > +
>> > nsm_clnt = clnt_ncreate("localhost", SM_PROG, SM_VERS,
>> > "tcp");
>> >
>> > if (CLNT_FAILURE(nsm_clnt)) { @@ -66,7 +68,7 @@ bool
>> > nsm_connect(void)
>> >
>> > void nsm_disconnect(void)
>> > {
>> > - if (nsm_count == 0 && nsm_clnt != NULL) {
>> > + if (nsm_clnt != NULL) {
>> > CLNT_DESTROY(nsm_clnt);
>> > nsm_clnt = NULL;
>> > AUTH_DESTROY(nsm_auth); @@ -83,6 +85,7 @@ bool
>> > nsm_monitor(state_nsm_client_t *host)
>> > enum clnt_stat ret;
>> > char client_ip[SOCK_NAME_MAX];
>> > char server_ip[SOCK_NAME_MAX * 2]; /* space for proxy_ip
>> > */
>> > + bool retry = false;
>> >
>> > if (host == NULL)
>> > return true;
>> > @@ -120,7 +123,7 @@ bool nsm_monitor(state_nsm_client_t *host)
>> > LogDebug(COMPONENT_NLM, "Monitor %s",
>> > host->ssc_nlm_caller_name);
>> >
>> > PTHREAD_MUTEX_lock(&nsm_mutex);
>> > -
>> > +again:
>> > /* create a connection to nsm on the localhost */
>> > if (!nsm_connect()) {
>> > LogCrit(COMPONENT_NLM, @@ -144,6 +147,20 @@ bool
>> > nsm_monitor(state_nsm_client_t *host)
>> > }
>> >
>> > if (ret != RPC_SUCCESS) {
>> > + if (! retry) {
>> > + /* Handle case when statd restarts.
>> > + * Need to reconnect with statd
>> > + */
>> > + LogFullDebug(COMPONENT_NLM,
>> > + "Try to reconnect with STATD");
>> > +
>> > + retry = true;
>> > + nsm_mon.mon_id.my_id.my_name = NULL;
>> > + clnt_req_release(cc);
>> > + nsm_disconnect();
>> > + goto again;
>> > + }
>> > +
>> > t = rpc_sperror(&cc->cc_error,
"failed");
>> > LogCrit(COMPONENT_NLM,
>> > "Monitor %s SM_MON %s",
>> > --
>> > 1.7.1
>> >
>> > Thanks & Regards,
>> > Suhrud
>> > _______________________________________________
>> > Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe
>> > send an email to devel-leave(a)lists.nfs-ganesha.org
>> >
>> _______________________________________________
>> Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send
>> an email to devel-leave(a)lists.nfs-ganesha.org
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org