I suspect that this fixes that:
136df4f26 MDCACHE - Release refs on dirents when chunk not consumed
We're planning on doing another 2.7, because UDP is broken, so that fix
will be included. In the meantime, it's in 2.8.0.1, so you might want
to think about switching to 2.8, as 2.7 will be end-of-life very soon.
Daniel
On 6/12/19 8:20 PM, Rungta, Vandana wrote:
Readdir of a directory tree with millions of files causes the cache
entries used to continue to grow beyond the Hi Watermark and never comes
back down even after stopping the readdir and unmounting the share.
Nfs Ganesha V2.7.4
File share with 5 million files - 2000 directories with 2500 files in each
Hi Watermark configured for 500,000
Linux client mounting the share with NFS version 4.1
Start a readdir
(gdb) print lru_state
$1 = {entries_hiwat = 500000, entries_used = 1989506, chunks_hiwat =
100000, chunks_used = 797, fds_system_imposed = 400000, fds_hard_limit =
396000,
fds_hiwat = 360000, fds_lowat = 200000, futility = 0, per_lane_work =
50, biggest_window = 160000, prev_fd_count = 0, prev_time = 1560376713,
fd_state = 0}
(gdb)
After some time:
(gdb) print lru_state
$1 = {entries_hiwat = 500000, entries_used = 2219506, chunks_hiwat =
100000, chunks_used = 889, fds_system_imposed = 400000, fds_hard_limit =
396000,
fds_hiwat = 360000, fds_lowat = 200000, futility = 0, per_lane_work =
50, biggest_window = 160000, prev_fd_count = 0, prev_time = 1560377613,
fd_state = 0}
(gdb)
Continues to grow: (Matches the number of entries readdir has returned)
(gdb) print lru_state
$1 = {entries_hiwat = 500000, entries_used = 2312006, chunks_hiwat =
100000, chunks_used = 926, fds_system_imposed = 400000, fds_hard_limit =
396000,
fds_hiwat = 360000, fds_lowat = 200000, futility = 0, per_lane_work =
50, biggest_window = 160000, prev_fd_count = 0, prev_time = 1560377973,
fd_state = 0}
(gdb)
Test ( Readdir ) stopped and share unmounted
(gdb) print lru_state
$1 = {entries_hiwat = 500000, entries_used = 2332006, chunks_hiwat =
100000, chunks_used = 934, fds_system_imposed = 400000, fds_hard_limit =
396000,
fds_hiwat = 360000, fds_lowat = 200000, futility = 0, per_lane_work =
50, biggest_window = 160000, prev_fd_count = 0, prev_time = 1560378063,
fd_state = 0}
(gdb)
2 hours after the test is stopped – entries_used is unchanged:
(gdb) print lru_state
$1 = {entries_hiwat = 500000, entries_used = 2332006, chunks_hiwat =
100000, chunks_used = 934, fds_system_imposed = 400000, fds_hard_limit =
396000,
fds_hiwat = 360000, fds_lowat = 200000, futility = 0, per_lane_work =
50, biggest_window = 160000, prev_fd_count = 0, prev_time = 1560382845,
fd_state = 0}
(gdb)
“top” shows continuously increasing memory usage by ganesha
3516 root20 0 1908m 1.7g 7648 S 19.0 11.1 4:02.14 ganesha.nfsd
3516 root20 0 3337m 3.1g 7648 S2.0 20.3 7:25.74 ganesha.nfsd
3516 root20 0 3710m 3.4g 7652 S 12.0 22.7 8:19.03 ganesha.nfsd
The issue is reproducible and I am happy to provide on any additional
debug info or re-run with debug flags turned on.
Thanks,
Vandana
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org