On Fri, 2018-06-08 at 10:22 -0400, J. Bruce Fields wrote:
On Wed, May 23, 2018 at 08:21:40AM -0400, Jeff Layton wrote:
> +Lifting the Grace Period
> +------------------------
> +Transitioning from recovery to normal operation really consists of two
> +different steps:
> +
> +1. the server decides that it no longer requires a grace period, either
> + due to it timing out or there not being any clients that would be
> + allowed to reclaim.
> +
> +2. the server stops enforcing the grace period and transitions to normal
> + operation
> +
> +These concepts are often conflated in a singleton servers, but in
> +a cluster we must consider them independently.
> +
> +When a server is finished with its own local recovery period, it should
> +clear its NEED flag. That server should continue enforcing the grace
> +period however until the grace period is fully lifted.
> +
> +If the servers' own NEED flag is the last one set, then it can lift the
> +grace period (by setting R=0). At that point, all servers in the cluster
> +can end grace period enforcement, and communicate that fact to the
> +others by clearing their ENFORCING flags.
I think this also needs to describe the ordering of the recovery
database switch and the epoch increment in the clustered case.
For "surviving" servers it doesn't matter since their recovery database
isn't changing.
For restarting servers, there's a window between clearing NEED and
clearing ENFORCING when their recovery database can't change.
The epoch musn't change till everybody's created a new recovery
database.
It must change before anyone grants a new non-reclaim lock, because at
that point it's no longer safe to use the older recovery databases.
(That could result in allowing a reclaim from a client which conflicts
with the new lock.)
So I think servers should 1) stop allowing reclaims, 2) create the new
recovery database, 3) atomically: clear NEED, check whether they're the
last to clear NEED, and bump the epoch, and 4) clear ENFORCING. ??
Do we have a race like this, in a 2-node cluster?:
- server 1 clears NEED
- server 2 restarts, sets NEED and ENFORCING
- server 2 sees that 1 still has ENFORCING set, starts accepting
reclaims
- server 1 clears ENFORCING, starts accepting non-reclaims.
After looking over this a bit more, I think there is a potential
problem. We're currently starting a new (local) grace period (and
setting our own enforcing flag) and then generating the recovery DBs
afterward.
If we crash between those two events then another node could lift the
grace period before this node comes back up and marks its NEED flag.
The easy fix is to just create a new recovery DB for the new epoch prior
to starting the grace period locally. Lightly tested patch here:
https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/415232
--
Jeff Layton <jlayton(a)kernel.org>