Try to find the open files by doing "ls -l /proc/<PID>/fds". Are you
using NFSv4 or V3? If this is all a V3, then clearly a bug. NFSv4 may imply
some clients opened the files but never closed for some reason or we
ignored client's CLOSE request.
On Mon, Jun 18, 2018 at 7:30 PM bharat singh <bharat064015(a)gmail.com> wrote:
I already have this patch
c2b448b1a079ed66446060a695e4dd06d1c3d1c2 Fix closing global file
descriptors
On Mon, Jun 18, 2018 at 5:41 AM Daniel Gryniewicz <dang(a)redhat.com> wrote:
> Try this one:
>
> 5c2efa8f077fafa82023f5aec5e2c474c5ed2fdf Fix closing global file
> descriptors
>
> Daniel
>
>
> On 06/15/2018 03:08 PM, bharat singh wrote:
> > I have been testing Ganesha 2.5.4 code with default mdcache settings.
> It
> > starts showing issues after prolonged I/O runs.
> > Once it exhausts all the allowed fds, its kind of gets stuck
> > returning ERR_FSAL_DELAY for every client op.
> >
> > A snapshot of the mdcache
> >
> > open_fd_count = 4055
> > lru_state = {
> > entries_hiwat = 100000,
> > entries_used = 323,
> > chunks_hiwat = 100000,
> > chunks_used = 9,
> > fds_system_imposed = 4096,
> > fds_hard_limit = 4055,
> > fds_hiwat = 3686,
> > fds_lowat = 2048,
> > futility = 109,
> > per_lane_work = 50,
> > biggest_window = 1638,
> > prev_fd_count = 4055,
> > prev_time = 1529013538,
> > fd_state = 3
> > }
> >
> > [cache_lru] lru_run :INODE LRU :INFO :After work, open_fd_count:4055
> > entries used count:327 fdrate:0 threadwait=9
> > [cache_lru] lru_run :INODE LRU :INFO :lru entries: 327
> open_fd_count:4055
> > [cache_lru] lru_run :INODE LRU :INFO :lru entries: 327open_fd_count:4055
> > [cache_lru] lru_run :INODE LRU :INFO :After work, open_fd_count:4055
> > entries used count:327 fdrate:0 threadwait=90
> >
> > I have killed the NFS clients, so no new I/O is being received. But
> even
> > after a couple of hours I don't see lru_run making any progress,
> thereby
> > open_fd_count remains a 4055 and even a single file open won't be
> > served. So basically the server is in stuck state.
> >
> > I have these changes patched over 2.5.4 code
> > e2156ad3feac841487ba89969769bf765457ea6e Replace cache_fds parameter
> and
> > handling with better logic
> > 667083fe395ddbb4aa14b7bbe7e15ffca87e3b0b MDCACHE - Change and lower
> > futility message
> > 37732e61985d919e6ca84dfa7b4a84163080abae Move open_fd_count from
> MDCACHE
> > to FSALs (
https://review.gerrithub.io/#/c/391267/)
> >
> > Any suggestions how to resolve this ?
> >
> >
> >
> >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
> >
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>
--
-Bharat
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org