# cat /proc/14459/limits
Limit                     Soft Limit           Hard Limit           Units
Max open files            4096                 4096                 files

I suspect a leak in lru_run_lane, but I might be wrong here

static inline size_t lru_run_lane(size_t lane, uint64_t *const totalclosed)
{
...
        /* check refcnt in range */
        if (unlikely(refcnt > 2)) {
            /* This unref is ok to be done without a valid op_ctx
             * because we always map a new entry to an export before
             * we could possibly release references in
             * mdcache_new_entry.
             */
            QUNLOCK(qlane);
            mdcache_lru_unref(entry);  >>>>>> won don'e have a fsal_close for this mdcache_lru_unref
            goto next_lru;
        }
...
        /* Make sure any FSAL global file descriptor is closed. */
        status = fsal_close(&entry->obj_handle);

        if (not_support_ex) {
            /* Release the content lock. */
            PTHREAD_RWLOCK_unlock(&entry->content_lock);
        }

        if (FSAL_IS_ERROR(status)) {
            LogCrit(COMPONENT_CACHE_INODE_LRU,
                "Error closing file in LRU thread.");
        } else {
            ++(*totalclosed);
            ++closed;
        }

        mdcache_lru_unref(entry);
}

On Mon, Jun 18, 2018 at 8:56 AM Daniel Gryniewicz <dang@redhat.com> wrote:
We do that.  If open_fd_count > fds_hard_limit, we return EDELAY in
mdcache_open2() and fsal_reopen_obj().

Daniel

On 06/18/2018 11:20 AM, Malahal Naineni wrote:
> The actual number of fds open is 554, at least that is what the kernel
> thinks. If you have open_fd_count as 4055, something is wrong in the
> accounting of open files. What is the max files your Ganesha daemon can
> open (cat /proc/<PID>/limits should tell you).  As far as I remember,
> the accounting value "open_fd_count" is only used to close files
> aggressively. Can you track code path where ganesha is sending DELAY error?
>
> On Mon, Jun 18, 2018 at 8:03 PM bharat singh <bharat064015@gmail.com
> <mailto:bharat064015@gmail.com>> wrote:
>
>     This is a V3 mount only.
>     There are a bunch of socket and anonymous fds opened, but that only
>     554. In current state my setup has 4055 fds opened and it won't make
>     any progress for days even without any new I/O coming in. I have a
>     coredump, please let me know what info you need out of it to debug this.
>
>     # ls -l  /proc/2576/fd | wc -l
>     554
>
>     On Mon, Jun 18, 2018 at 7:12 AM Malahal Naineni <malahal@gmail.com
>     <mailto:malahal@gmail.com>> wrote:
>
>         Try to find the open files by doing "ls -l  /proc/<PID>/fds".
>         Are you using NFSv4 or V3? If this is all a V3, then clearly a
>         bug. NFSv4 may imply some clients opened the files but never
>         closed for some reason or we ignored client's CLOSE request.
>
>         On Mon, Jun 18, 2018 at 7:30 PM bharat singh
>         <bharat064015@gmail.com <mailto:bharat064015@gmail.com>> wrote:
>
>             I already have this patch
>             c2b448b1a079ed66446060a695e4dd06d1c3d1c2 Fix closing global
>             file descriptors
>
>
>
>             On Mon, Jun 18, 2018 at 5:41 AM Daniel Gryniewicz
>             <dang@redhat.com <mailto:dang@redhat.com>> wrote:
>
>                 Try this one:
>
>                 5c2efa8f077fafa82023f5aec5e2c474c5ed2fdf Fix closing
>                 global file descriptors
>
>                 Daniel
>
>
>                 On 06/15/2018 03:08 PM, bharat singh wrote:
>                  > I have been testing Ganesha 2.5.4 code with default
>                 mdcache settings. It
>                  > starts showing issues after prolonged I/O runs.
>                  > Once it exhausts all the allowed fds, its kind of
>                 gets stuck
>                  > returning ERR_FSAL_DELAY for every client op.
>                  >
>                  > A snapshot of the mdcache
>                  >
>                  > open_fd_count = 4055
>                  > lru_state = {
>                  >    entries_hiwat = 100000,
>                  >    entries_used = 323,
>                  >    chunks_hiwat = 100000,
>                  >    chunks_used = 9,
>                  >    fds_system_imposed = 4096,
>                  >    fds_hard_limit = 4055,
>                  >    fds_hiwat = 3686,
>                  >    fds_lowat = 2048,
>                  >    futility = 109,
>                  >    per_lane_work = 50,
>                  >    biggest_window = 1638,
>                  >    prev_fd_count = 4055,
>                  >    prev_time = 1529013538,
>                  >    fd_state = 3
>                  > }
>                  >
>                  > [cache_lru] lru_run :INODE LRU :INFO :After work,
>                 open_fd_count:4055
>                  > entries used count:327 fdrate:0 threadwait=9
>                  > [cache_lru] lru_run :INODE LRU :INFO :lru entries:
>                 327 open_fd_count:4055
>                  > [cache_lru] lru_run :INODE LRU :INFO :lru entries:
>                 327open_fd_count:4055
>                  > [cache_lru] lru_run :INODE LRU :INFO :After work,
>                 open_fd_count:4055
>                  > entries used count:327 fdrate:0 threadwait=90
>                  >
>                  > I have killed the NFS clients, so no new I/O is being
>                 received. But even
>                  > after a couple of hours I don't see lru_run making
>                 any progress, thereby
>                  > open_fd_count remains a 4055 and even a single file
>                 open won't be
>                  > served. So basically the server is in stuck state.
>                  >
>                  > I have these changes patched over 2.5.4 code
>                  > e2156ad3feac841487ba89969769bf765457ea6e Replace
>                 cache_fds parameter and
>                  > handling with better logic
>                  > 667083fe395ddbb4aa14b7bbe7e15ffca87e3b0b MDCACHE -
>                 Change and lower
>                  > futility message
>                  > 37732e61985d919e6ca84dfa7b4a84163080abae Move
>                 open_fd_count from MDCACHE
>                  > to FSALs (https://review.gerrithub.io/#/c/391267/)
>                  >
>                  > Any suggestions how to resolve this ?
>                  >
>                  >
>                  >
>                  >
>                  > _______________________________________________
>                  > Devel mailing list -- devel@lists.nfs-ganesha.org
>                 <mailto:devel@lists.nfs-ganesha.org>
>                  > To unsubscribe send an email to
>                 devel-leave@lists.nfs-ganesha.org
>                 <mailto:devel-leave@lists.nfs-ganesha.org>
>                  >
>                 _______________________________________________
>                 Devel mailing list -- devel@lists.nfs-ganesha.org
>                 <mailto:devel@lists.nfs-ganesha.org>
>                 To unsubscribe send an email to
>                 devel-leave@lists.nfs-ganesha.org
>                 <mailto:devel-leave@lists.nfs-ganesha.org>
>
>
>
>             --
>             -Bharat
>
>
>             _______________________________________________
>             Devel mailing list -- devel@lists.nfs-ganesha.org
>             <mailto:devel@lists.nfs-ganesha.org>
>             To unsubscribe send an email to
>             devel-leave@lists.nfs-ganesha.org
>             <mailto:devel-leave@lists.nfs-ganesha.org>
>
>
>
>     --
>     -Bharat
>
>
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
w
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org


--
-Bharat