Thank you Vandana for the compliments!
#1. We don't seem to have a limit on the number of dirents. We have a limit
on the chunks, by default seems to be 100K. Each chunk by default can have
128 entries, so that is about 12800K == 12.8 Million? You can stop your
workload, and then access or create 500K files (not directories). The files
should be cached and any dirs should be purged leading to free of all
dirents. Of course, if this is a memory leak, then you will still have some
indicating a real bug. Ideally, we want a dbus command to free all
objects, this way we can confirm if something is leaking in this area. We
cache too many structures without a way to purge them. This makes things
harder to debug for memory leaks.
#2. If you export /a/b/c and /a/b. All files in /a/b/c appear to be in 2
exports, so they might allocate multiple export structures with the same
object. Not sure when this happens though. I do recall, an object can have
a list of associated exports.
Regards, Malahal.
PS: mmleak.so is a generic tool, works on any daemon, but it dumps files
which need space and processing. Having a code maintain all allocations in
ganesha itself would be better (it won't find memory leaks in libraries
used by ganesha though) and dumping them on demand is something we should
have. It makes easier to debug memory leaks.
On Fri, Jul 26, 2019 at 3:47 AM Rungta, Vandana <vrungta(a)amazon.com> wrote:
Daniel,
I notice a gradual creep up in memory used by ganesha when I set up long
running load tests. So, I started my tests with mmleak.so preloaded (
https://github.com/malahal/mmleak).
( malahal – Compliments on mmleak. Very nice memory debugging tool).
After about 15 hours of tests I shrank/processed the information dumped by
mmleak.
The following is the current counts of the addresses where memory is
allocated – *without a corresponding free*. The first one is reasonable
because I have Entries_HWMark = 500000.
All locations from Nfs-ganesha V2.7.6
Count Location. Allocation line
500,000 0x52aad0.
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1788
1,700,985 0x539186.
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:209
1,700,985 0x53b2b3.
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:392
8,992,812 0x541186.
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:2214
9,015,224 0x538c82.
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:651
1. Does having 9 million mdcache_dir_entry_t allocated from
mdc_readdir_chunk_object (mdcache_helpers.c 2214) and corresponding
mdcache_key_dup (mdcache_int.h) seem high?
2. There are 1.7 million entry_export_map (mdcache_helpers.c 392) and
cih_hash_key allocations (mdcache_hash.h 209) that as far as I can tell are
from mdcache_alloc_entry line 735. Does it seem odd that with 500,000
mdcache entries we have 1.7 million export maps and hash keys?
Thoughts?
Thanks,
Vandana