You can stop NFS server and restart after few minutes, NFS clients will have to survive this if mounted with 'hard' option (default). The order sequence is correct in Ganesha. You need to find out why the client is turning it into EIO. As a quick test, please stop nfs-ganesha daemon and see if the client reports an EIO!

On Fri, Sep 13, 2019 at 8:06 PM Frank Filz <ffilzlnx@mindspring.com> wrote:


> -----Original Message-----
> From: Daniel Gryniewicz [mailto:dang@redhat.com]
> Sent: Friday, September 13, 2019 5:32 AM
> To: devel@lists.nfs-ganesha.org
> Subject: [NFS-Ganesha-Devel] Re: PROGRAM_NOT_AVAILABLE at server boot up
> causing IO error
>
> On 9/13/19 5:57 AM, Suhrud Patankar wrote:
> > Hello All,
> >
> > At server boot, Ganesha registers with rpcbind only after loading all
> > the exports.
> > If export loading takes some time, then there is a window, where
> > clients can get PROGRAM_NOT_AVAILABLE error.
> > What I see is, the client tries to get NLM port number only a few
> > times before converting to IO error.
> >
> > #~/multilock/ml_posix_client
> > OPEN 0 RW create foo
> > 1 OPEN OK 0 3
> > #### Reboot the server after this.##### LOCKW 0 read 10 10
> > 2 LOCKW ERRNO 5 "Input/output error" "Lock failed" bad token "read 10 10"
> >
> > I understand loading the exports fast will reduce this window but the
> > window will still be there.
> > Also FSALs can take longer to create exports if there are high number
> > of exports.
> >
> > Has anyone seen this?  Is it possible to register with rpcbind before
> > loading the exports?
> >
>
> You can only put a few/no exports in your config and then load the rest via
> DBUS.

Hmm, I'm not sure how we can handle issues here...

The problem with delaying adding the exports until we're up is that a client that comes in with a request right after a crash (for recovery or just because it happened to come in at that point) might get a stale file handle error since the client sent a handle for an export that hasn't been loaded in yet.

On the other hand, if we take too long to start up, clients may start to time out, though actually that should be handled by hard mount.

I think it would help to better understand the exact sequence and timing here (and maybe look at the client code to see exactly what it's doing).

I think ultimately, we need to set up all the exports and everything before we open sockets and do rpcbind. Once we unlock the doors we need to be ready for an immediate storm or customers...

Frank
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org