Try to find the open files by doing "ls -l  /proc/<PID>/fds".  Are you using NFSv4 or V3? If this is all a V3, then clearly a bug. NFSv4 may imply some clients opened the files but never closed for some reason or we ignored client's CLOSE request.

On Mon, Jun 18, 2018 at 7:30 PM bharat singh <bharat064015@gmail.com> wrote:
I already have this patch
c2b448b1a079ed66446060a695e4dd06d1c3d1c2 Fix closing global file descriptors



On Mon, Jun 18, 2018 at 5:41 AM Daniel Gryniewicz <dang@redhat.com> wrote:
Try this one:

5c2efa8f077fafa82023f5aec5e2c474c5ed2fdf Fix closing global file descriptors

Daniel


On 06/15/2018 03:08 PM, bharat singh wrote:
> I have been testing Ganesha 2.5.4 code with default mdcache settings. It
> starts showing issues after prolonged I/O runs.
> Once it exhausts all the allowed fds, its kind of gets stuck
> returning ERR_FSAL_DELAY for every client op.
>
> A snapshot of the mdcache
>
> open_fd_count = 4055
> lru_state = {
>    entries_hiwat = 100000,
>    entries_used = 323,
>    chunks_hiwat = 100000,
>    chunks_used = 9,
>    fds_system_imposed = 4096,
>    fds_hard_limit = 4055,
>    fds_hiwat = 3686,
>    fds_lowat = 2048,
>    futility = 109,
>    per_lane_work = 50,
>    biggest_window = 1638,
>    prev_fd_count = 4055,
>    prev_time = 1529013538,
>    fd_state = 3
> }
>
> [cache_lru] lru_run :INODE LRU :INFO :After work, open_fd_count:4055 
> entries used count:327 fdrate:0 threadwait=9
> [cache_lru] lru_run :INODE LRU :INFO :lru entries: 327 open_fd_count:4055
> [cache_lru] lru_run :INODE LRU :INFO :lru entries: 327open_fd_count:4055
> [cache_lru] lru_run :INODE LRU :INFO :After work, open_fd_count:4055 
> entries used count:327 fdrate:0 threadwait=90
>
> I have killed the NFS clients, so no new I/O is being received. But even
> after a couple of hours I don't see lru_run making any progress, thereby
> open_fd_count remains a 4055 and even a single file open won't be
> served. So basically the server is in stuck state.
>
> I have these changes patched over 2.5.4 code
> e2156ad3feac841487ba89969769bf765457ea6e Replace cache_fds parameter and
> handling with better logic
> 667083fe395ddbb4aa14b7bbe7e15ffca87e3b0b MDCACHE - Change and lower
> futility message
> 37732e61985d919e6ca84dfa7b4a84163080abae Move open_fd_count from MDCACHE
> to FSALs (https://review.gerrithub.io/#/c/391267/)
>
> Any suggestions how to resolve this ?
>
>
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org


--
-Bharat


_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org