Hi Jeff,
Thanks for the clarification.
I am using linux kernel client, so it is very unlikely it misses a RECLAIM_COMPLETE. I
observed some RECLAIM_COMPLETE from other clients around the same time.
I had another doubt which may not be related to this, but shouldn't we have similar
checks for CLAIM_NULL and CLAIM_FH in open4_validate_reclaim? I see that CLAIM_FH just
checks the minor version.
Thanks,
Sriram
On 5/31/19, 4:32 PM, "Jeff Layton" <jlayton(a)redhat.com> wrote:
On Fri, 2019-05-31 at 10:08 +0000, Sriram Patil wrote:
Hi,
Recently I came across an issue where NFS client lease expired so ganesha returned
NFS4ERR_EXPIRED. This resulted in the client creating a new session with EXCHANGE_ID +
CREATE_SESSION.
The client id was immediately confirmed in CREATE_SESSION because the recov directory was
not deleted. I observed that ganesha sets “cid_allow_reclaim = true” in
nfs4_op_create_session->nfs4_chk_clid->nfs4_chk_clid_impl. This flags allows the
client to do reclaims, even though ganesha is not in grace. CLAIM_PREVIOUS, in
“open4_validate_reclaim” is as follows,
case CLAIM_PREVIOUS:
want_grace = true;
if (!clientid->cid_allow_reclaim ||
((data->minorversion > 0) &&
clientid->cid_cb.v41.cid_reclaim_complete))
status = NFS4ERR_NO_GRACE;
break;
cid_allow_reclaim is just a flag saying that the client in question is
present in the recovery DB. The logic above looks correct to me.
Now, there is another flag to mark the completion of reclaim from
client “clientid->cid_cb.v41.cid_reclaim_complete”. This flag is set to true as part of
RECLAIM_COMPLETE operation. Now, consider a case where ganesha does not receive
RECLAIM_COMPLETE from the client and the CLAIM_NULL case in
"open4_validate_reclaim",
case CLAIM_NULL:
if ((data->minorversion > 0)
&& !clientid->cid_cb.v41.cid_reclaim_complete)
status = NFS4ERR_GRACE;
break;
So, the client gets stuck in a loop for OPEN with CLAIM_NULL, because it keeps returning
NFS4ERR_GRACE.
A client that doesn't send a RECLAIM_COMPLETE before attempting to do a
non-reclaim open is broken. RFC5661, page 567:
Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned.
So the above behavior is correct, IMO.
I guess allowing clients to reclaim as long as they keep sending the
reclaim requests is the point of implementing sticky grace periods. But if
RECLAIM_COMPLETE is lost we should not be stuck in grace period forever. May be we can
change cid_allow_reclaim to the time at which last reclaim request was received. And then
allow non-reclaim requests after (cid_allow_reclaim + grace_period), which means ganesha
will wait for a RECLAIM_COMPLETE for a full grace period. We could choose the timeout to
be grace_period/3 or something if that makes more sense.
The point of sticky grace periods was to ensure that we don't end up
with a ToC/ToU race with the grace period. In general, we check whether
we're in the grace period at the start of an operation, but we could end
up lifting it or entering it after that check but before the operation
was complete. With the sticky grace period patches, we ensure that we
remain in whichever state we need until the operation is done.
In general, this should not extend the length of the grace period unless
you have an operation that is taking an extraordinarily long time before
putting its reference. Maybe you have an operation that is stuck and
holding a grace reference?
But this will ensure that SERVER does not fail because
RECLAIM_COMPLETE was not sent.
Meanwhile I am also trying to figure out why NFS client did not send RECLAIM_COMPLETE.
That's the real question.
--
Jeff Layton <jlayton(a)redhat.com>