[NFS-Ganesha-Devel] Re: Hitting a crash in mdcache_lru_cleanup_push.

Friday, 17 April 2020

2.7.6 is now almost a year old, and includes lot of MDCACHE fixes, 
including LRU related fixes, since 2.7.1.  In addition, 2.8 and 3.x both 
have additional fixes.  I'd estimate at least a dozen total.

Daniel

On 4/16/20 8:59 PM, Pradeep Thomas wrote:
...
 Hello Daniel/Frank,

 While debugging a crash from 2.7.1 Ganesha, I see a potential race between the two paths
below:

 Thread 1 (waiting for the qlock to insert to LRU)
 nfs4_mds_putfh -> mdcache_create_handle -> mdcache_locate_host ->
mdcache_new_entry -> mdcache_lru_insert -> lru_insert_entry

 Thread 2 (unlink the same object) - since the object is already in mdcache at this point,
I believe other threads will get it.
 fsal_remove -> mdcache_unlink -> _mdc_unreachable -> _mdcache_kill_entry ->
mdcache_lru_cleanup_push

 The second thread will find the lru something like this:
 $5 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 2, flags = 0, lane =
12, cf = 0}

 So, the below code will end up crashing:
          if (!(lru->qid == LRU_ENTRY_CLEANUP)) {
                  struct lru_q *q;

                  /* out with the old queue */
                  q = lru_queue_of(entry); <<-- q will be NULL because qid ==
LRU_ENTRY_NONE

 Should Thread 2 just ignore if q is NULL and let Thread1's operation to free the
entry later?

 Also, please let me know if there is any recent fixes in this area.

 Thanks,
 Pradeep
 _______________________________________________
 Devel mailing list -- devel(a)lists.nfs-ganesha.org
 To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Devel] Re: Hitting a crash in mdcache_lru_cleanup_push.