-----Original Message-----
From: Daniel Gryniewicz [mailto:dang@redhat.com]
Sent: Friday, September 13, 2019 5:32 AM
To: devel(a)lists.nfs-ganesha.org
Subject: [NFS-Ganesha-Devel] Re: PROGRAM_NOT_AVAILABLE at server boot up
causing IO error
On 9/13/19 5:57 AM, Suhrud Patankar wrote:
> Hello All,
>
> At server boot, Ganesha registers with rpcbind only after loading all
> the exports.
> If export loading takes some time, then there is a window, where
> clients can get PROGRAM_NOT_AVAILABLE error.
> What I see is, the client tries to get NLM port number only a few
> times before converting to IO error.
>
> #~/multilock/ml_posix_client
> OPEN 0 RW create foo
> 1 OPEN OK 0 3
> #### Reboot the server after this.##### LOCKW 0 read 10 10
> 2 LOCKW ERRNO 5 "Input/output error" "Lock failed" bad token
"read 10 10"
>
> I understand loading the exports fast will reduce this window but the
> window will still be there.
> Also FSALs can take longer to create exports if there are high number
> of exports.
>
> Has anyone seen this? Is it possible to register with rpcbind before
> loading the exports?
>
You can only put a few/no exports in your config and then load the rest via
DBUS.
Hmm, I'm not sure how we can handle issues here...
The problem with delaying adding the exports until we're up is that a client that
comes in with a request right after a crash (for recovery or just because it happened to
come in at that point) might get a stale file handle error since the client sent a handle
for an export that hasn't been loaded in yet.
On the other hand, if we take too long to start up, clients may start to time out, though
actually that should be handled by hard mount.
I think it would help to better understand the exact sequence and timing here (and maybe
look at the client code to see exactly what it's doing).
I think ultimately, we need to set up all the exports and everything before we open
sockets and do rpcbind. Once we unlock the doors we need to be ready for an immediate
storm or customers...
Frank