Hi Bill,  Madhu did find an issue with ganesha2.5 that we are using where it does NOT do an unref in certain readdir code path. We will patch our version soon.

Regards, Malahal.

On Thu, Sep 19, 2019 at 9:29 PM Daniel Gryniewicz <dang@redhat.com> wrote:
So, it's not impossible, based on the workload, but it may also be a bug.

For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
when the client closes the FD, and opening/closing all the time causes a
large performance hit.  So, we cache open FDs.

All handles in MDCACHE live on the LRU.  This LRU is divided into 2
levels.  Level 1 is more active handles, and they can have open FDs.
Various operation can demote a handle to level 2 of the LRU.  As part of
this transition, the global FD on that handle is closed.  Handles that
are actively in use (have a refcount taken on them) are not eligible for
this transition, as the FD may be being used.

We have a background thread that runs, and periodically does this
demotion, closing the FDs.  This thread runs more often when the number
of open FDs is above FD_HwMark_Percent of the available number of FDs,
and runs constantly when the open FD count is above FD_Limit_Percent of
the available number of FDs.

So, a heavily used server could definitely have large numbers of FDs
open.  However, there have also, in the past, been bugs that would
either keep the FDs from being closed, or would break the accounting (so
they were closed, but Ganesha still thought they were open).  You didn't
say what version of Ganesha you're using, so I can't tell if one of
those bugs apply.

Daniel

On 9/19/19 10:19 AM, Billich  Heinrich Rainer (ID SD) wrote:
> Hello,
>
> Is it usual to see 200’000-400’000 open files for a single ganesha
> process? Or does this indicate that something ist wrong?
>
> We have some issues with ganesha (on spectrum scale protocol nodes)
>   reporting NFS3ERR_IO in the log. I noticed that the affected nodes
> have a large number of open files, 200’000-400’000 open files per daemon
> (and 500 threads and about 250 client connections). Other nodes have
> 1’000 – 10’000 open files by ganesha only and don’t show the issue.
>
> If someone could explain how ganesha decides which files to keep open
> and which to close that would help, too. As NFSv3 is stateless the
> client doesn’t open/close a file, it’s the server to decide when to
> close it? We do have a few NFSv4 clients, too.
>
> Are there certain access patterns that can trigger such a large number
> of open file? Maybe traversing and reading a large number of small files?
>
> Thank you,
>
> Heiner
>
> I did count the open files  by counting the entries in /proc/<pid of
> ganesha>/fd/ . With several 100k entries I failed to do a ‘ls -ls’ to
> list all the symbolic links, hence I can’t relate the open files to
> different exports easily.
>
> --
>
> =======================
>
> Heinrich Billich
>
> ETH Zürich
>
> Informatikdienste
>
> Tel.: +41 44 632 72 56
>
> heinrich.billich@id.ethz.ch
>
> ========================
>
>
> _______________________________________________
> Support mailing list -- support@lists.nfs-ganesha.org
> To unsubscribe send an email to support-leave@lists.nfs-ganesha.org
>
_______________________________________________
Support mailing list -- support@lists.nfs-ganesha.org
To unsubscribe send an email to support-leave@lists.nfs-ganesha.org