On 4/10/19 5:27 AM, fanzi2009(a)hotmail.com wrote:
Hi Daniel,
I agree with your thought, but the implemation of FSAL_CEPH seems to be different with
yours. Here is the design of it.
https://www.mankier.com/8/ganesha-rados-cluster-design
I also checked the code. When one of servers enter a grace period, it will call
rados_cluster_read_clids() to read culster wide variable C and R. Only if R==0, the whole
culster enter grace period. If so, the cluster goes grace period when reboot, not when
crash. It also seems no interface for pacemaker/corosync to get information and put the
cluster into grace period.
The FSAL_CEPH design is different. It's based on kubernetes and
containerized ceph, so that if a ganesha instance crashes, kubernetes
immediately starts a new instance of that ganesha, with the same IP
address and everything. This means that we *can* go into grace on
"reboot", since the reboot is a matter if a very few seconds, rather
than a full reboot. If this was running on normal hardware, especially
server hardware with it's many minute reboot time, this would not be
possible. This is why traditional HA systems based on
pacemaker/corosync or ctdb need to put the cluster into grace on crash,
rather than on reboot.
Daniel