The problem with trying a fixed number of entries is that it doesn't
help, unless that number is large enough to be essentially unlimited.
The design of the reap algorithm is that it's O(1) in the number of
entries (O(n) in the number of queues), so that we can invoke it in the
fast path. If we search a small number of entries on each queue, that
doesn't ensure any more than searching 1 that we find something,
especially with large readdir workloads. I think Frank is right, we
need to do something else.
The LRU currently serves two purposes:
1. A list of entries in LRU order, so that we can reap and reuse them.
2. A divided list that facilitates closing open global FDs
It seems clear to me that 1 is more important than 2, especially since
NFSv4 will rarely use global FDs, and some of our most important FSALs
don't use system FDs at all.
What do we think is a good design for handling these two cases?
Daniel
On 6/23/22 16:44, Frank Filz wrote:
Hmm, what's the path to taking a ref without moving to MRU of L2
or LRU of L1?
There are a number of issues with sticky entries blowing the cache size coming up that
really need to be resolved.
I wonder if we should remove any entry with refcount >1 from the LRU, and note where
we should place it when refcount is reduced to 1. That would take out the pinned entries
as well as temporarily in use entries. The trick would be the refcount bump for the actual
LRU processing.
Frank
> -----Original Message-----
> From: Pradeep Thomas [mailto:pradeepthomas@gmail.com]
> Sent: Thursday, June 23, 2022 1:36 PM
> To: devel(a)lists.nfs-ganesha.org
> Subject: [NFS-Ganesha-Devel] Unclaimable MDCache entries at the LRU end of
> L2 queue.
>
> Hello,
>
> I'm hitting a scenario where the entry at the LRU end of L2 queue becomes
> active. But we don't move it to L1 - likely because the entry becomes active in
> the context of a readdir. The cache keeps growing to a point where kernel will
> invoke oom killer to terminate ganesha process.
>
> When we reap entries (lru_reap_impl), could we look beyond LRU end - perhaps
> try a fixed number of entries? Another option is to garbage collect the L2 queue
> also and free claimable entries beyond LRU end of the queue(through
> mdcache_lru_release_entries()). Any other thoughts?
> In the instance below, MDCache is supposed to be capped at 100K entries. But it
> grows to > 5 million entries (~17*310K).
>
> sudo gdb -q -p $(pidof ganesha.nfsd) -batch -ex 'p LRU[0].L1' -ex 'p
LRU[0].L2' -
> ex 'p LRU[1].L1' -ex 'p LRU[1].L2' -ex 'p LRU[2].L1' -ex
'p LRU[2].L2' -ex 'p
> LRU[3].L1' -ex 'p LRU[3].L2' -ex 'p LRU[4].L1' -ex 'p
LRU[4].L2' -ex 'p LRU[5].L1' -
> ex 'p LRU[5].L2' -ex 'p LRU[6].L1' -ex 'p LRU[6].L2'
>
> $1 = {q = {next = 0x7fe16a6adc30, prev = 0x7fe066775d30}, id = LRU_ENTRY_L1,
> size = 37}
> $2 = {q = {next = 0x7fe0cd6d1130, prev = 0x7fdd595e2030}, id = LRU_ENTRY_L2,
> size = 310609}
> $3 = {q = {next = 0x7fe222cc7930, prev = 0x7fe0e8afaf30}, id = LRU_ENTRY_L1,
> size = 37}
> $4 = {q = {next = 0x7fdfa2022d30, prev = 0x7fe01c386b30}, id = LRU_ENTRY_L2,
> size = 310459}
> $5 = {q = {next = 0x7fdfdd8acb30, prev = 0x7fe233849b30}, id = LRU_ENTRY_L1,
> size = 31}
> $6 = {q = {next = 0x7fdf014e7e30, prev = 0x7fdd90fd7430}, id = LRU_ENTRY_L2,
> size = 310297}
> $7 = {q = {next = 0x7fde79a4f030, prev = 0x7fe233a4aa30}, id = LRU_ENTRY_L1,
> size = 32}
> $8 = {q = {next = 0x7fe061388430, prev = 0x7fdd24b5cf30}, id = LRU_ENTRY_L2,
> size = 310659}
> $9 = {q = {next = 0x7fe1e96ce430, prev = 0x7fe0b3b4b130}, id = LRU_ENTRY_L1,
> size = 34}
> $10 = {q = {next = 0x7fe00d84ff30, prev = 0x7fdd685b1530}, id =
> LRU_ENTRY_L2, size = 310635}
> $11 = {q = {next = 0x7fdf9df4fb30, prev = 0x7fe2414aaa30}, id = LRU_ENTRY_L1,
> size = 33}
> $12 = {q = {next = 0x7fe165e82d30, prev = 0x7fdf1d2b8a30}, id =
> LRU_ENTRY_L2, size = 310566}
> $13 = {q = {next = 0x7fe159e55a30, prev = 0x7fde3f973d30}, id =
> LRU_ENTRY_L1, size = 41}
> $14 = {q = {next = 0x7fdf4fbb9030, prev = 0x7fdea8ca0730}, id =
> LRU_ENTRY_L2, size = 310460}
>
> First entry has a ref of 2. But next entries are actually claimable.
>
> sudo gdb -q -p $(pidof ganesha.nfsd) -batch -ex 'p *(mdcache_lru_t
> *)LRU[0].L2.q.next'
> $1 = {q = {next = 0x7fe0fbff0c30, prev = 0x7fe250de2960 <LRU+32>}, qid =
> LRU_ENTRY_L2, refcnt = 2, flags = 0, lane = 0, cf = 0}
>
> sudo gdb -q -p $(pidof ganesha.nfsd) -batch -ex 'p *(mdcache_lru_t
> *)0x7fe0fbff0c30'
> $1 = {q = {next = 0x7fe0c2c5a130, prev = 0x7fe0cd6d1130}, qid =
> LRU_ENTRY_L2, refcnt = 1, flags = 0, lane = 0, cf = 0}
>
> sudo gdb -q -p $(pidof ganesha.nfsd) -batch -ex 'p *(mdcache_lru_t
> *)0x7fe0c2c5a130'
> $1 = {q = {next = 0x7fe06dfeac30, prev = 0x7fe142936430}, qid =
> LRU_ENTRY_L2, refcnt = 1, flags = 0, lane = 0, cf = 0}
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to
> devel-leave(a)lists.nfs-ganesha.org