Thank you Vandana for the compliments!

#1. We don't seem to have a limit on the number of dirents. We have a limit on the chunks, by default seems to be 100K. Each chunk by default can have 128 entries, so that is about 12800K == 12.8 Million? You can stop your workload, and then access or create 500K files (not directories). The files should be cached and any dirs should be purged leading to free of all dirents. Of course, if this is a memory leak, then you will still have some indicating a real bug. Ideally, we want a dbus command to free all objects, this way we can confirm if something is leaking in this area. We cache too many structures without a way to purge them. This makes things harder to debug for memory leaks.

#2. If you export /a/b/c and /a/b. All files in /a/b/c appear to be in 2 exports, so they might allocate multiple export structures with the same object. Not sure when this happens though. I do recall, an object can have a list of associated exports.

Regards, Malahal.

PS: mmleak.so is a generic tool, works on any daemon, but it dumps files which need space and processing. Having a code maintain all allocations in ganesha itself would be better (it won't find memory leaks in libraries used by ganesha though) and dumping them on demand is something we should have. It makes easier to debug memory leaks.

On Fri, Jul 26, 2019 at 3:47 AM Rungta, Vandana <vrungta@amazon.com> wrote:

Daniel,

I notice a gradual creep up in memory used by ganesha when I set up long running load tests. So, I started my tests with mmleak.so preloaded (https://github.com/malahal/mmleak).

( malahal – Compliments on mmleak. Very nice memory debugging tool).

After about 15 hours of tests I shrank/processed the information dumped by mmleak.

The following is the current counts of the addresses where memory is allocated – without a corresponding free. The first one is reasonable because I have Entries_HWMark = 500000.

All locations from Nfs-ganesha V2.7.6

Count          Location.      Allocation line

500,000      0x52aad0.    /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1788

1,700,985     0x539186.   /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:209

1,700,985     0x53b2b3.   /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:392

8,992,812     0x541186. /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:2214

9,015,224     0x538c82.   /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:651

Does having 9 million mdcache_dir_entry_t allocated from mdc_readdir_chunk_object (mdcache_helpers.c 2214) and corresponding mdcache_key_dup (mdcache_int.h) seem high?
There are 1.7 million entry_export_map (mdcache_helpers.c 392) and cih_hash_key allocations (mdcache_hash.h 209) that as far as I can tell are from mdcache_alloc_entry line 735. Does it seem odd that with 500,000 mdcache entries we have 1.7 million export maps and hash keys?

Thoughts?

Thanks,

Vandana