Hi all,
There are a few clients eg- Ubuntu 18.04.3 (4.15.0-55-generic) and RH7.8 (3.10.0-1127.el7.x86_64) for which we have observed... simple command like 'dd' either hangs or returns EIO. This is happening only on krb5i and krb5p mounts. It seems to happen for file sizes eg- 100MB and larger mostly. But sometimes even a 30 MB file sees failures.
A client eg- RH7.6 (3.10.0-957.el7.x86_64) does not seem to hit this issue...so might be with more recent kernels?
We fixed the issue with check-in https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/490802 The idea was to let clients know that Ganesha denied the request VS just dropping the request.
This fix did seem to help and hangs/errors stopped completely... but for larger file sizes eg- 1000MB we started seeing "Permission Denied" errors. This was different than the EIO errors seen earlier. Reason could be we are now sending an "AUTH DENIED" error so clients translate it to this new error.
We tried to add more logging into Ganesha and observe that these particular clients seem to send a lot of requests together. When we process same, the sequence no. is pretty much out or order and we drop the requests outside the sequence window, as per the RFC 2203 Section 7.2.1. The sequence window that we have is 32.
Testing these clients with kNFS does not hit the issue...The kNFS sequence window seems to be larger and is 128.
So, tried to increase the sequence window as well to 128 for ganesha. That does not seem to help fix the issue.
We also have below additional 'seqmask' check and many of the requests went into that category as well and were dropped.
"libntirpc/src/svc_auth_gss.c":