On Wed, May 23, 2018 at 08:21:40AM -0400, Jeff Layton wrote:
From: Jeff Layton <jlayton(a)redhat.com>
Bruce asked for better design documentation, so this is my attempt at
it. Let me know what you think. I'll probably end up squashing this into
one of the code patches but for now I'm sending this separately to see
if it helps clarify things.
Suggestions and feedback are welcome.
Change-Id: I53cc77f66b2407c2083638e5760666639ba1fd57
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
src/doc/man/ganesha-rados-cluster.rst | 227 ++++++++++++++++++++++++++
1 file changed, 227 insertions(+)
create mode 100644 src/doc/man/ganesha-rados-cluster.rst
diff --git a/src/doc/man/ganesha-rados-cluster.rst
b/src/doc/man/ganesha-rados-cluster.rst
new file mode 100644
index 000000000000..1ba2d3c29093
--- /dev/null
+++ b/src/doc/man/ganesha-rados-cluster.rst
@@ -0,0 +1,227 @@
+==============================================================================
+ganesha-rados-cluster-design -- Clustered RADOS Recovery Backend Design
+==============================================================================
+
+.. program:: ganesha-rados-cluster-design
+
+This document aims to explain the theory and design behind the
+rados_cluster recovery backend, which coordinates grace period
+enforcement among multiple, independent NFS servers.
+
+In order to understand the clustered recovery backend, it's first necessary
+to understand how recovery works with a single server:
+
+Singleton Server Recovery
+-------------------------
+NFSv4 is a lease-based protocol. Clients set up a relationship to the
+server and must periodically renew their lease in order to maintain
+their ephemeral state (open files, locks, delegations or layouts).
+
+When a singleton NFS server is restarted, any ephemeral state is lost. When
+the server comes comes back online, NFS clients detect that the server has
+been restarted and will reclaim the ephemeral state that they held at the
+time of their last contact with the server.
+
+Singleton Grace Period
+----------------------
+
+In order to ensure that we don't end up with conflicts, clients are
+barred from acquiring any new state while in the Recovery phase. Only
+reclaim operations are allowed.
+
+This period of time is called the **grace period**. Most NFS servers
+have a grace period that lasts around two lease periods, however
knfsd's is one lease period, who does two?
(Still catching up on the rest. Looks good.)
--b.