Hi, comments inline.
On 3/12/20 1:57 AM, Michael Bisig wrote:
Hi all,
I move this issue/question from ceph-users to nfs-ganesha devel list as requested by
Daniel Gryniewicz. (thanks for pointing me to that list)
I might have a configuration issue or at least a non-optimal working ganesha cluster. You
might help me. :)
But I am not sure if my problem is by design, a bug or just a configuration issue.
Anyway, thanks in advance for your help and time!
Specs:
Ceph v14.2.8
Ganesha v3.0 within Docker container running on Ubuntu 18.04 Image
Config: Please find attached my configuration. (other values are default such as
GracePeriod or LeaseTime)
Setup:
Two running Ganesha daemons which I configured in the grace db (with rados_cluster
backend). The db lies in the cephfsmetadata pool in a separate namespace. I add two nodes
to the db using:
ganesha-rados-cluster add a
ganesha-rados-cluster add b (for sure on the right pool and ns in Ceph)
Both daemons can read/write to the db, and this is fine. They can also clean up rec-XX
files after a restart (meaning deleting them if they are outdated). I can mount the nfs
exposed path over both daemons. So far so good!
Problem:
When I turn off one daemon (e.g. b ), i.e. stopping the container, the shutdown works
smoothly and the db finally shows:
a E
b NE
I assume that all clients connected to b are stale. But I experience that also all
clients to a are stale (or at least most tasks). Meaning that I cannot read nor write to
the mounted filesystem. But I can ls the mountpoint what means that it is not completely
broken. This cluster state is not cleaned up, so waiting for 5 mins did not change the
behavior over ganesha a. I would assume that at least after some periods the clients
connected to daemon a can read/write as usual. Also the db, entries do not change.
So, as I said on the other list, new opens/reads/writes/locks will be
blocked on the entire cluster for the duration of the grace period.
This makes testing read/write somewhat problematic. You have to have a
process that has a file already open, and is waiting to read or write,
and then trigger the failure.
Long running processes that operate on a single set of files should
continue uninterrupted. This means things like database servers.
However, even for long running processes, opening new files will fail.
This means things like web servers.
The grace *should* be lifted after a grace period (default 90 seconds),
regardless of success or failure. The code looks like it does this, but
something is obviously wrong here.
If daemon b crashes (instead of shutdown). The clients connected to daemon a can still
read/write and are not affected by the crash of b. So this is fine for a crash situation.
This is probably related to the fact that daemon b cannot set the NEED flag in the db.
After a while, the running daemon a shows a heartbeat warning, what is certainly expected
and a very handy message to let you know that something in the cluster is shaky.
This tells me that grace isn't working when there's a crash, or
read/write would be blocked. Obviously Ganesha can't trigger grace in
this case, since it's crashed. Something else needs to monitor the
state of the Ganesha servers, and trigger grace when one of them
crashes. In a traditional HA setup, this is pacemaker/corosync or
ctdb. In a containerized environment, this should probably be
kubernetes. My guess is that nothing's doing this for you, and
therefore you aren't protected from a failure.
Expectation:
I would expect that a proper shutdown off one daemon does not affect the clients
connected to the running ganesha a.
Logs are very clean:
# Situation where I stopped daemon b
11/03/2020 15:46:06 : epoch 5e68d0c3 : a : ganesha.nfsd-1[reaper] nfs_lift_grace_locked
:STATE :EVENT :NFS Server Now NOT IN GRACE
11/03/2020 15:46:31 : epoch 5e68d0c3 : a : ganesha.nfsd-1[reaper] nfs_start_grace :STATE
:EVENT :NFS Server Now IN GRACE, duration 90
--> and hear it hangs (so no GRACE lift appears, even waiting for 5-10mins what is not
nice in an active-active environment)
Once I start the daemon again, everything works like a charm! And the logs show only ONE
additional line (compared to above):
11/03/2020 15:46:06 : epoch 5e68d0c3 : a : ganesha.nfsd-1[reaper] nfs_lift_grace_locked
:STATE :EVENT :NFS Server Now NOT IN GRACE
11/03/2020 15:46:31 : epoch 5e68d0c3 : a : ganesha.nfsd-1[reaper] nfs_start_grace :STATE
:EVENT :NFS Server Now IN GRACE, duration 90
11/03/2020 15:54:53 : epoch 5e68d0c3 : a : ganesha.nfsd-1[reaper] nfs_lift_grace_locked
:STATE :EVENT :NFS Server Now NOT IN GRACE
I do not have more informative logs (using default log-level FULL_DEBUG) with warnings or
errors, everything seems to work just fine!
Any explanation might help to understand the situation.
Can you post a full log of the case where grace was never exited?
Ideally from both servers (the one that was restarted, and the one that
stayed up).
Daniel