Wow, I really hate top posts. Very hard to comprehend your responses.
On 12/5/18 12:28 AM, gaurav gangalwar wrote:
I waited for more that 14 hours. This is the script I ran.
for i in {1..300}; do echo $i; mount -t nfs -o vers=3,tcp 10.15.196.92:/export /mnt1;
umount /mnt1; done
It went out of ads at 169th iteration.
169
mount.nfs: Connection timed out
umount: /mnt1: not mounted
I really don't understand. There are no constants in the current V2.8-dev
code base
that add up to more than 14 hours, and no constant that would yield 169.
In any case, this particular denial of service attack should be easy to fix by
adding SVC_DESTROY to the Ganesha umount code path(s). However, because it seems
rather unlikely this could happen in production, it's not currently a high priority.
Do you have a production environment or customer report where this occurred?
Would you like help writing the Ganesha patch?
Are you with Nutanix? Or IBM?
After this I was not able to mount or do any operation. There was no
active mount on my client. So there was no activity from client side, so does it mean it
will never cleanup?
No, as I wrote,
# If you stop doing anything and simply wait, that will be 8.24 hours
# (1023 * 29 seconds).
I've discussed this with DanG, and that ntirpc design logic from before my time
seems faulty. It does cleanup more frequently on a busy system, but is very
slow on an idle system (the most likely time to have an fd become idle).
DanG suggested that a better approach would be to trigger cleanup based upon a
high-water-mark. The best approach would be to order the list of fds by latest
recv, and clean out the idle ones as soon as they expire -- instead of running
through the list of (tens of) thousands of fds. That shouldn't be too hard,
but we must be careful that we don't add new locking in the fast path.
But this is an own-time project for me, and my primary interest is RDMA and
performance. So it may be some time before I've a chance to look at it. Maybe
over the holiday....