It depends on how the statd is configured.
If statd send notifications on restart and Ganesha is not restarted,
then there will be inconsistency.
Ganesha will think the client is monitored but statd will not have the
client monitored as sm-notify will clear it.
If statd is configured not to send notifications on its own, then
statd restart is NOOP, we just need to reconnect.
On Thu, Oct 25, 2018 at 9:31 AM Malahal Naineni <malahal(a)gmail.com> wrote:
>
> nsm_monitor and nsm_unmonitor use a single lock to serialize everything. I am not
sure what problem would it cause.
>
> On Wed, Oct 24, 2018 at 7:25 PM Daniel Gryniewicz <dang(a)redhat.com> wrote:
>>
>> Ah, okay, I see now. The problem is, I'm not sure how
>> force-disconnect interacts with nsm_count. I don't think we can just
>> disregard it. In fact, looking at it, we can't force delete nsm_clnt,
>> because another nsm_monitor() call might be using it. Maybe we need a
>> separate client for each nsm_monitor() call? I don't know if that
>> works with the lock manager, though.
>>
>> Frank, any ideas?
>>
>> Daniel
>> On Wed, Oct 24, 2018 at 4:22 AM Malahal Naineni <malahal(a)gmail.com> wrote:
>> >
>> > Dan, nsm_disconnect() is disconnecting every time as opposed to only when
nsm_count==0. I think this works because it is connecting and disconnecting everything,
not just on error. We could pass a flag to nsm_disconnect(force=true) on error and
nsm_disconnect(force=false) with normal case. We need this as a way to recovery
mechanism.
>> >
>> > Regards, Malahal.
>> > PS: I just looked at this nsm code for a customer issue I posted on IRC
yesterday! Thank you Suhrud.
>> >
>> > On Tue, Oct 23, 2018 at 7:55 PM Daniel Gryniewicz <dang(a)redhat.com>
wrote:
>> >>
>> >> Discussion inline.
>> >>
>> >> On 10/23/2018 07:58 AM, Suhrud Patankar wrote:
>> >> > Hello All,
>> >> >
>> >> > I have two questions related to NLM and rpc.statd.
>> >> >
>> >> > Setup:
>> >> > Ganesha version: V2.5.0.4
>> >> > CentOS Linux release 7.5.1804
>> >> > FSAL – Any
>> >> >
>> >> > 1. By default libntirpc has -DPORTMAP build flag enabled. As I
>> >> > understand this will use UDP to communicate with rpc.statd. With
this
>> >> > Ganesha fails to monitor the NLM clients and hence lock requests
fail.
>> >> > The call to clnt_tp_ncreate_timed() fails in
>> >> > clnt_dg_control(CLSET_SVC_ADDR) as addr->len is 16 where as
>> >> > sizeof(cu->cu_raddr) is 128.
>> >> > If I disable -DPORTMAP build flag in tirpc, then Ganesha uses “tcp”
to
>> >> > connect to rpc.statd and everything works as expected.
>> >> > Are we supposed to use “tcp” and disable the -DPORTMAP flag? Why
is
>> >> > this flag enabled by default?
>> >>
>> >> This isn't related to UDP vs. TCP. The issue is that one address
buffer
>> >> passed in is IPv4, and the other is IPv6, and an IPv6 address cannot
fit
>> >> into an IPv4 buffer. It's definitely a bug, but I don't see
exactly how
>> >> this is happening. Can you give a more detailed explanation, with
>> >> callpaths to get there?
>> >>
>> >> >
>> >> > 2. Case: rpc.statd service restarts. Ganesha is not restarted.
>> >> > nsm_connect() is done only once. On failure to connect to
rpc.statd,
>> >> > we don’t retry. This causes the NLM lock request to fail with “No
>> >> > lock” error.
>> >> > Is it expected that Ganesha will restart when rpc.statd restarts?
I
>> >> > added a patch to re-call nsm_connect() on failure and it works for
me
>> >> > now.
>> >> >
>> >> > Is this correct way to handle this?
>> >>
>> >> We should probably re-connect when statd fails, but I'm not sure
how
>> >> this patch accomplishes that. All it does is retry if the initial
>> >> connect fails, it doesn't seem to retry later on. Can you explain
more
>> >> how this is working?
>> >>
>> >> Daniel
>> >>
>> >> > The patch is as below.
>> >> > ------------------------------
>> >> > From: Suhrud Patankar <suhrudpatankar(a)gmail.com>
>> >> > Date: Thu, 11 Oct 2018 00:35:45 -0700
>> >> > Subject: [PATCH] NFSv3: NLM - Reconnect with statd on failure
>> >> >
>> >> > Ganesha creates the statd connection context only once.
>> >> > In case statd service restarts, we fail to monitor new clients.
>> >> > On failure, destroy the STATD connection and recreate new one.
>> >> > ---
>> >> > src/Protocols/NLM/nsm.c | 21 +++++++++++++++++++--
>> >> >
>> >> > 1 files changed, 19 insertions(+), 2 deletions(-)
>> >> >
>> >> >
>> >> > diff --git a/src/Protocols/NLM/nsm.c b/src/Protocols/NLM/nsm.c
>> >> >
>> >> > index 09909d9..8bb41d7 100644
>> >> > --- a/src/Protocols/NLM/nsm.c
>> >> > +++ b/src/Protocols/NLM/nsm.c
>> >> > @@ -47,6 +47,8 @@ bool nsm_connect(void)
>> >> > if (nsm_clnt != NULL)
>> >> > return true;
>> >> >
>> >> > + LogFullDebug(COMPONENT_NLM, "Creating new nsm
connection ");
>> >> > +
>> >> > nsm_clnt = clnt_ncreate("localhost", SM_PROG,
SM_VERS, "tcp");
>> >> >
>> >> > if (CLNT_FAILURE(nsm_clnt)) {
>> >> > @@ -66,7 +68,7 @@ bool nsm_connect(void)
>> >> >
>> >> > void nsm_disconnect(void)
>> >> > {
>> >> > - if (nsm_count == 0 && nsm_clnt != NULL) {
>> >> > + if (nsm_clnt != NULL) {
>> >> > CLNT_DESTROY(nsm_clnt);
>> >> > nsm_clnt = NULL;
>> >> > AUTH_DESTROY(nsm_auth);
>> >> > @@ -83,6 +85,7 @@ bool nsm_monitor(state_nsm_client_t *host)
>> >> > enum clnt_stat ret;
>> >> > char client_ip[SOCK_NAME_MAX];
>> >> > char server_ip[SOCK_NAME_MAX * 2]; /* space for proxy_ip
*/
>> >> > + bool retry = false;
>> >> >
>> >> > if (host == NULL)
>> >> > return true;
>> >> > @@ -120,7 +123,7 @@ bool nsm_monitor(state_nsm_client_t *host)
>> >> > LogDebug(COMPONENT_NLM, "Monitor %s",
host->ssc_nlm_caller_name);
>> >> >
>> >> > PTHREAD_MUTEX_lock(&nsm_mutex);
>> >> > -
>> >> > +again:
>> >> > /* create a connection to nsm on the localhost */
>> >> > if (!nsm_connect()) {
>> >> > LogCrit(COMPONENT_NLM,
>> >> > @@ -144,6 +147,20 @@ bool nsm_monitor(state_nsm_client_t *host)
>> >> > }
>> >> >
>> >> > if (ret != RPC_SUCCESS) {
>> >> > + if (! retry) {
>> >> > + /* Handle case when statd restarts.
>> >> > + * Need to reconnect with statd
>> >> > + */
>> >> > + LogFullDebug(COMPONENT_NLM,
>> >> > + "Try to reconnect with
STATD");
>> >> > +
>> >> > + retry = true;
>> >> > + nsm_mon.mon_id.my_id.my_name = NULL;
>> >> > + clnt_req_release(cc);
>> >> > + nsm_disconnect();
>> >> > + goto again;
>> >> > + }
>> >> > +
>> >> > t = rpc_sperror(&cc->cc_error,
"failed");
>> >> > LogCrit(COMPONENT_NLM,
>> >> > "Monitor %s SM_MON %s",
>> >> > --
>> >> > 1.7.1
>> >> >
>> >> > Thanks & Regards,
>> >> > Suhrud
>> >> > _______________________________________________
>> >> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
>> >> > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>> >> >
>> >> _______________________________________________
>> >> Devel mailing list -- devel(a)lists.nfs-ganesha.org
>> >> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org