Hello Daniel/Frank,
While debugging a crash from 2.7.1 Ganesha, I see a potential race between the two paths
below:
Thread 1 (waiting for the qlock to insert to LRU)
nfs4_mds_putfh -> mdcache_create_handle -> mdcache_locate_host ->
mdcache_new_entry -> mdcache_lru_insert -> lru_insert_entry
Thread 2 (unlink the same object) - since the object is already in mdcache at this point,
I believe other threads will get it.
fsal_remove -> mdcache_unlink -> _mdc_unreachable -> _mdcache_kill_entry ->
mdcache_lru_cleanup_push
The second thread will find the lru something like this:
$5 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 2, flags = 0, lane =
12, cf = 0}
So, the below code will end up crashing:
if (!(lru->qid == LRU_ENTRY_CLEANUP)) {
struct lru_q *q;
/* out with the old queue */
q = lru_queue_of(entry); <<-- q will be NULL because qid ==
LRU_ENTRY_NONE
Should Thread 2 just ignore if q is NULL and let Thread1's operation to free the entry
later?
Also, please let me know if there is any recent fixes in this area.
Thanks,
Pradeep