I waited for more that 14 hours. This is the script I ran.
for i in {1..300}; do echo $i; mount -t nfs -o vers=3,tcp 10.15.196.92:/export /mnt1; umount /mnt1; done
It went out of ads at 169th iteration.
169
mount.nfs: Connection timed out
umount: /mnt1: not mounted

After this I was not able to mount or do any operation. There was no active mount on my client. So there was no activity from client side, so does it mean it will never cleanup?

Regards,
Gaurav


On Tue, Dec 4, 2018 at 7:44 PM William Allen Simpson <william.allen.simpson@gmail.com> wrote:
On 12/4/18 12:27 AM, gaurav gangalwar wrote:
> Thanks for raising it on Ganesha list.
> Just want to add, once fds are depleted, they are not getting cleaned up even after hours.
> I needed to restart Ganesha process to recover from this state.

How many hours did you wait?

Currently, the cleanup is triggered after 1023 epoll wakeups.

If you stop doing anything and simply wait, that will be 8.24 hours
(1023 * 29 seconds).

Before 2.6, that was 120 seconds for 34 hours.

After all, there's no good reason to cleanup with no activity.

Also, cleanup means fd has no recv activity in __svc_params->idle_timeout.
Ganesha default is nfs_param.core_param.rpc.idle_timeout_s = 300 seconds.

If you're using a standard client doing idles, it will never cleanup.


> I am not sure if idle cleanup code is able to get rid of extra ref and release xprt.

There is no extra ref.  We had one too few refs.  That's what DanG fixed.


> I tried with this fix in cleanup code  its working for me.
> https://paste.fedoraproject.org/paste/lapz1NrOlBxS342S4Q79aw
>
As I tried to explain where s/he reported it, that isn't a fix.  It releases
some other reference still remaining, which in turn will probably lead to
referencing freed memory.

All this does is force a close(fd), which allows more connections reusing
the old fd.  Masking the symptoms.

The real underlying security issue is that a malicious adversary can run a
resource exhaustion attack.  That's what needs to be fixed.