On 1/28/19 7:58 PM, Kropelin, Adam wrote:>
(gdb) print lru_state
$8 = {entries_hiwat = 500000, entries_used = 2437134, chunks_hiwat = 100000, chunks_used
= 2002, fds_system_imposed = 400000, fds_hard_limit = 396000, fds_hiwat = 360000,
fds_lowat = 200000, futility = 0, per_lane_work = 50, biggest_window = 160000,
prev_fd_count = 160, prev_time = 1548692973, fd_state = 0}
It looks like you're running with the default value for per_lane_work.
I'd suggest increasing Reaper_Work_Per_Lane in your config, we also had
this kind of issue caused by ~permanently referenced entries in the LRU
(due to delegations).
For reference, we are using:
Entries_HWMark = 2500000;
Reaper_Work_Per_Lane = 20000;
(gdb) print *(struct mdcache_lru__ *)0xb84809e0
$8 = {q = {next = 0xaa559720, prev = 0x7e12a0 <LRU+3200>}, qid = LRU_ENTRY_CLEANUP,
refcnt = 3, flags = 3,
lane = 14, cf = 0}
(gdb) print *(struct mdcache_lru__ *)0xaa559720
$9 = {q = {next = 0xd9754930, prev = 0xb84809e0}, qid = LRU_ENTRY_CLEANUP, refcnt = 3,
flags = 3, lane = 14, cf = 0}
(gdb) print *(struct mdcache_lru__ *)0xd9754930
$10 = {q = {next = 0xa94e7f40, prev = 0xaa559720}, qid = LRU_ENTRY_CLEANUP, refcnt = 2,
flags = 3, lane = 14, cf = 0}
(gdb) print *(struct mdcache_lru__ *)0xa94e7f40
$11 = {q = {next = 0x185aa7d0, prev = 0xd9754930}, qid = LRU_ENTRY_CLEANUP, refcnt = 3,
flags = 3, lane = 14, cf = 0}
(gdb) print *(struct mdcache_lru__ *)0x185aa7d0
$12 = {q = {next = 0xcb912a30, prev = 0xa94e7f40}, qid = LRU_ENTRY_CLEANUP, refcnt = 1,
flags = 3, lane = 14, cf = 0}
(gdb) print *(struct mdcache_lru__ *)0xcb912a30
$13 = {q = {next = 0xc7727370, prev = 0x185aa7d0}, qid = LRU_ENTRY_CLEANUP, refcnt = 2,
flags = 3, lane = 14, cf = 0}
(gdb)
$14 = {q = {next = 0xc7727370, prev = 0x185aa7d0}, qid = LRU_ENTRY_CLEANUP, refcnt = 2,
flags = 3, lane = 14, cf = 0}
I see that there are some referenced entries in the CLEANUP queue and I
don't think that's normal behavior, maybe you should investigate this.
--
Fatih ACAR
Gandi
fatih.acar(a)gandi.net