Easier said than done. Bear in mind that all of the recovery backend
stuff is entirely for the purpose of dealing with server restarts, so
you really do have to be careful not to leave gaps in particular
scenarios.
Let's say you do decide to synchronously store open and lock records
in a central RADOS-based database and both Server A and Server B are
using it.
Server A crashes, and the client decides to reconnect to Server B
using the recorded clientid/session. Server B says "Oh, this session
was previously held by Server A." Now what happens?
You need a mechanism to transfer the CephFS state (opens, locks, caps,
etc.) from Server A to Server B. Nothing like that exists today, but
we do have some tentative plans to allow cephfs clients to reclaim
state they previously held. In principle, that could be extended to
allow "takeover" in some fashion.
But wait...it gets worse!
Suppose we have a 3 node ganesha cluster and some of Server A's
clients decide to go to Server C instead? Now a simple takeover is not
enough -- you need a way to split that state granularly.
Couple all of this with the basic truism that failures in these sorts
of architectures are often cascading. You need to deal with the
possibility that any node could just die at any time, and decide how
you're going to deal with that. A lot of the original ganesha recovery
backend work had gaping holes in the "takeover" mechanisms where
failures at an inopportune time could make it so no clients could
recover anything.
This is very much a non-trivial problem in my experience, but don't
let me dissuade you if you have considered these scenarios and have
thoughts on how to address it.
--
Jeff Layton <jlayton(a)poochiereds.net>
On Tue, Apr 2, 2019 at 10:46 PM <fanzi2009(a)hotmail.com> wrote:
>
> How about I persist all the session information into backend(rados for example)? If
so, even though the client connect to another server, the server can reconstruct session
from backend. Client don't need to re-create session and don't need to reclaim.
>
> > On Tue, Apr 2, 2019 at 9:21 AM Daniel Gryniewicz <dang(a)redhat.com>
wrote:
> >
> > I mostly agree with everything Dan said here. Clustered environments
> > require extra care, and we don't currently have a way to migrate
> > clients between different ganesha heads that are exporting the same
> > clustered fs. I've done some experimentation with v4 migration here,
> > but there's nothing in stock ganesha today for this.
> >
> > You can, however, avoid putting all of the other cluster nodes into
> > the grace period if you know that none of the other cluster nodes can
> > acquire state that will need to be reclaimed by a clients of the node
> > that is restarting. We have some tentative plans to implement this for
> > FSAL_CEPH + the RADOS recovery backends someday, but it requires
> > support in the Ceph MDS and userland client libraries that does not
> > yet exist.
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org