[NFS-Ganesha-Devel] Re: NFSv3 Filebench tests hang

Wednesday, 4 March 2020

I'm not sure what's happening here.  Your LRU_Run_Interval is 60, which 
leaves a window if a workload starts at the wrong time that opens many 
FDs per second, but when we pass the HWMark (which is 50% for you, so 
lots of space) we'll wake up the LRU thread, and it should run.    You 
could change a few of the log messages in lru_run() and/or 
lru_run_lane() to LogEvent(), so that they show up in the log without 
needing to turn on FULL_DEBUG (which will likely slow things down enough 
to affect the test).  This should tell you what the LRU is doing during 
that time.  Several fixes went in post 2.8 that you might be interested 
in, Several of which may apply here:

aa07c9b9886b5c7f5fce32138e58f71345a3696e

MDCACHE: fix lru_run thread don't get scheduling when thread_delay 
adjusted to zero

a49352d63b407010ac0d5817a8772787c7691b73

MDCACHE - Put ref on root

It may just be best to update to 2.8.3 and try again.

Daniel

On 3/4/20 5:08 AM, des(a)vmware.com wrote:
...
 We are using V2.8.2 but last we pulled was 7 months ago.

 Frank, your fix I mentioned above did fix the Filebench hang in fileserver workload.

 But now we are seeing new issue in the 'videoserver' workload.

 We are seeing a dump of these errors in ganesha log:

 2020-03-03T21:08:53Z : epoch 5e5bfd3c : fsvm23 : ganesha.nfsd-33[::ffff:172.30.0.121]
[svc_3546] 288
 :vdfs_filehandle_open :FSAL :vdfs_open failed: could not get attributes: Input/output
error (5)
 2020-03-03T21:08:53Z : epoch 5e5bfd3c : fsvm23 : ganesha.nfsd-33[::ffff:172.30.0.121]
[svc_3546] 84
 :posix2fsal_error :FSAL :Mapping 5 to ERR_FSAL_IO, rlim_cur=65536 rlim_max=65536

 There are like 200k lines of these errors within few minutes. I think all operations are
failing with this error.

 So this means Ganesha's LRU threads could not close the FDs faster and hence hitting
the underlying filesystem
 resource limits?

 ulimit -n on our server is 65536, and ganesha.conf is configured with :

 CACHEINODE { LRU_Run_Interval = 60; FD_Limit_Percent = 75; FD_HWMark_Percent = 50;
FD_LWMark_Percent = 20; Entries_HWMark = 65536; Reaper_Work_Per_Lane = 100; }

 So we should have hit the high water mark and before underlying filesystem and closed the
FDs right?

 _______________________________________________
 Devel mailing list -- devel(a)lists.nfs-ganesha.org
 To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org 

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Devel] Re: NFSv3 Filebench tests hang