I have been dumping several TBs of data over the last few days. I noticed that my copy
operations have been hanging for several hours. My ganesha.log is flooded with the below
log.
[cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded. The LRU thread is unable
to make progress in reclaiming FDs, will try harder.
I turned on full debug logs and I see the following logs. Looks like it is not closing any
FDs.
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[cache_lru] lru_run :INODE LRU :F_DBG :LRU awakes.
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[cache_lru] lru_run :INODE LRU :F_DBG :lru entries: 51444
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[cache_lru] lru_run :INODE LRU :DEBUG :Open FDs over high water mark,
reapring aggressively.
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[chunk_lru] chunk_lru_run :INODE LRU :F_DBG :LRU awakes, lru chunks
used: 337
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from
lane 0
.
.
.
.
24/10/2019 13:20:50 : epoch 5da91d15 :
paplbur04.telecom.tcnz.net :
ganesha.nfsd-101379[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50
entries on lane 0 closing 0 descriptors
I checked the source code and looks like FD is not closed because of this block. What
could cause this ref-count to go above 2? This is my own FSAL. Could the FSAL be not
decreasing the ref-count? I dont seem to have this problem with NFSv4.
FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c
/* check refcnt in range */
if (unlikely(refcnt > 2)) {
/* This unref is ok to be done without a valid op_ctx
* because we always map a new entry to an export before
* we could possibly release references in
* mdcache_new_entry.
*/
QUNLOCK(qlane);
mdcache_lru_unref(entry);
goto next_lru;
}
I can attach the full log file in the next email if required. I don't have logs at the
full-debug level when it was working fine.