We recently hit some issues where the number of ganesha mdcache entries grew to 6-9 Million when the mdcache hiwater mark was at 450,000. ROOT CAUSE Upon further debugging we found that readdir was...
github.com
|
What Ganesha version are you using?
Could you open a github issue for this?
Thanks
Frank
From: Deepak Arumugam Sankara Subramanian [mailto:deepakarumugam.s@nutanix.com]
Sent: Saturday, June 24, 2023 1:25 PM
To: devel@lists.nfs-ganesha.org; Frank Filz <ffilzlnx@mindspring.com>
Subject: [NFS-Ganesha-Devel] MDCACHE LRU reaper thread issues
ISSUE
We recently hit some issues where the number of ganesha mdcache entries grew to 6-9 Million when the mdcache hiwater mark was at 450,000.
ROOT CAUSE
Upon further debugging we found that readdir was not playing well with garbage collection. The major issue comes from the fact that we have a
temporary pointer from the dirent to the inode mdcache entry.
Today when readdir fills up a chunk it puts a temporary reference from dirent to the inode cache entry. This happens inside mdc_readdir_chunk_object
new_dir_entry->entry = new_entry;
Then we clear it out inside mdcache_readdir_chunked,
if (has_write && dirent->entry) {
/* If we get here, we have the write lock, have an
* entry, and took a ref on it above. The dirent also
* has a ref on the entry. Drop that ref now. This can
* only be done under the write lock. If we don't have
* the write lock, then this was not the readdir that
* took the ref, and another readdir will drop the ref,
* or it will be dropped when the dirent is cleaned up.
* */
mdcache_lru_unref(dirent->entry, LRU_FLAG_NONE);
dirent->entry = NULL;
}
But this means each readdir can hold a refcount on 128 entries in the inode cache at a time.
This leads to a scenario where the following happens
POSSIBLE RESOLUTIONS/QUESTIONS
Inside mdc_readdir_chunk_object we call mdcache_new_entry with LRU_FLAG_NONE
status = mdcache_new_entry(export, sub_handle, attrs_in, false, NULL,
false, &new_entry, NULL, LRU_FLAG_NONE);
if its a inode cache hit it follows this path
status = mdcache_find_keyed_reason(&key, entry, flags);
fsal_status_t status;
/* Initial Ref on entry */
mdcache_lru_ref(*entry, flags);
since flags is not LRU_REQ_INITIAL this entry doesn't get adjusted to the MRU of L2 or LRU of L1
s.. mdcache_lru_insert(nentry, flags);
if flags has
LRU_FLAG_NONE mdcache_lru_insert inserts the entry into the MRU of L2
We would really appreciate your comments on the questions/resolutions,
Thanks,
Deepak