[Nfs-ganesha-devel] Re: V2.5.4: lru_run won't progress, open_fd_count exhausted

Monday, 18 June 2018

This sounds like maybe an accounting issue?  I would generally expect 
the number of open files to match open_fd_count (assuming you're using 
FSAL_VFS, of course...)  Looking through the code, it *looks* like 
everything is correct, accounting wise.

Daniel

On 06/18/2018 10:32 AM, bharat singh wrote:
...
 This is a V3 mount only.
 There are a bunch of socket and anonymous fds opened, but that only 554. 
 In current state my setup has 4055 fds opened and it won't make any 
 progress for days even without any new I/O coming in. I have a coredump, 
 please let me know what info you need out of it to debug this.

 # ls -l  /proc/2576/fd | wc -l
 554

 On Mon, Jun 18, 2018 at 7:12 AM Malahal Naineni <malahal(a)gmail.com 
 <mailto:malahal@gmail.com>> wrote:

     Try to find the open files by doing "ls -l  /proc/<PID>/fds".  Are
     you using NFSv4 or V3? If this is all a V3, then clearly a bug.
     NFSv4 may imply some clients opened the files but never closed for
     some reason or we ignored client's CLOSE request.

     On Mon, Jun 18, 2018 at 7:30 PM bharat singh <bharat064015(a)gmail.com
     <mailto:bharat064015@gmail.com>> wrote:

         I already have this patch
         c2b448b1a079ed66446060a695e4dd06d1c3d1c2 Fix closing global file
         descriptors

         On Mon, Jun 18, 2018 at 5:41 AM Daniel Gryniewicz
         <dang(a)redhat.com <mailto:dang@redhat.com>> wrote:

             Try this one:

             5c2efa8f077fafa82023f5aec5e2c474c5ed2fdf Fix closing global
             file descriptors

             Daniel

             On 06/15/2018 03:08 PM, bharat singh wrote:
              > I have been testing Ganesha 2.5.4 code with default
             mdcache settings. It
              > starts showing issues after prolonged I/O runs.
              > Once it exhausts all the allowed fds, its kind of gets stuck
              > returning ERR_FSAL_DELAY for every client op.
              >
              > A snapshot of the mdcache
              >
              > open_fd_count = 4055
              > lru_state = {
              >    entries_hiwat = 100000,
              >    entries_used = 323,
              >    chunks_hiwat = 100000,
              >    chunks_used = 9,
              >    fds_system_imposed = 4096,
              >    fds_hard_limit = 4055,
              >    fds_hiwat = 3686,
              >    fds_lowat = 2048,
              >    futility = 109,
              >    per_lane_work = 50,
              >    biggest_window = 1638,
              >    prev_fd_count = 4055,
              >    prev_time = 1529013538,
              >    fd_state = 3
              > }
              >
              > [cache_lru] lru_run :INODE LRU :INFO :After work,
             open_fd_count:4055
              > entries used count:327 fdrate:0 threadwait=9
              > [cache_lru] lru_run :INODE LRU :INFO :lru entries: 327
             open_fd_count:4055
              > [cache_lru] lru_run :INODE LRU :INFO :lru entries:
             327open_fd_count:4055
              > [cache_lru] lru_run :INODE LRU :INFO :After work,
             open_fd_count:4055
              > entries used count:327 fdrate:0 threadwait=90
              >
              > I have killed the NFS clients, so no new I/O is being
             received. But even
              > after a couple of hours I don't see lru_run making any
             progress, thereby
              > open_fd_count remains a 4055 and even a single file open
             won't be
              > served. So basically the server is in stuck state.
              >
              > I have these changes patched over 2.5.4 code
              > e2156ad3feac841487ba89969769bf765457ea6e Replace
             cache_fds parameter and
              > handling with better logic
              > 667083fe395ddbb4aa14b7bbe7e15ffca87e3b0b MDCACHE - Change
             and lower
              > futility message
              > 37732e61985d919e6ca84dfa7b4a84163080abae Move
             open_fd_count from MDCACHE
              > to FSALs (https://review.gerrithub.io/#/c/391267/)
              >
              > Any suggestions how to resolve this ?
              >
              >
              >
              >
              > _______________________________________________
              > Devel mailing list -- devel(a)lists.nfs-ganesha.org
             <mailto:devel@lists.nfs-ganesha.org>
              > To unsubscribe send an email to
             devel-leave(a)lists.nfs-ganesha.org
             <mailto:devel-leave@lists.nfs-ganesha.org>
              >
             _______________________________________________
             Devel mailing list -- devel(a)lists.nfs-ganesha.org
             <mailto:devel@lists.nfs-ganesha.org>
             To unsubscribe send an email to
             devel-leave(a)lists.nfs-ganesha.org
             <mailto:devel-leave@lists.nfs-ganesha.org>

         -- 
         -Bharat

         _______________________________________________
         Devel mailing list -- devel(a)lists.nfs-ganesha.org
         <mailto:devel@lists.nfs-ganesha.org>
         To unsubscribe send an email to
         devel-leave(a)lists.nfs-ganesha.org
         <mailto:devel-leave@lists.nfs-ganesha.org>

 -- 
 -Bharat

 _______________________________________________
 Devel mailing list -- devel(a)lists.nfs-ganesha.org
 To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org

2025

2024

2023

2022

2021

2020

2019

2018

[Nfs-ganesha-devel] Re: V2.5.4: lru_run won't progress, open_fd_count exhausted