[NFS-Ganesha-Devel] Re: Hitting a crash in mdcache_lru_cleanup_push.

Thursday, 16 April 2020

Someone (from amazon?) reported this issue but didn't follow up with a
patch. So we ended up with a patch in our code base. See
commit ce4c049b0871eaae78325bb30a1ef1df3a14e77c in our repo (
https://github.com/ganltc/nfs-ganesha.git branch ibm2.7).

Regards, Malahal.

On Fri, Apr 17, 2020 at 6:29 AM Pradeep Thomas <pradeepthomas(a)gmail.com&gt;
wrote:

...
 Hello Daniel/Frank,

 While debugging a crash from 2.7.1 Ganesha, I see a potential race between
 the two paths below:

 Thread 1 (waiting for the qlock to insert to LRU)
 nfs4_mds_putfh -> mdcache_create_handle -> mdcache_locate_host ->
 mdcache_new_entry -> mdcache_lru_insert -> lru_insert_entry

 Thread 2 (unlink the same object) - since the object is already in mdcache
 at this point, I believe other threads will get it.
 fsal_remove -> mdcache_unlink -> _mdc_unreachable -> _mdcache_kill_entry
 -> mdcache_lru_cleanup_push

 The second thread will find the lru something like this:
 $5 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 2,
 flags = 0, lane = 12, cf = 0}

 So, the below code will end up crashing:
         if (!(lru->qid == LRU_ENTRY_CLEANUP)) {
                 struct lru_q *q;

                 /* out with the old queue */
                 q = lru_queue_of(entry); <<-- q will be NULL because qid
 == LRU_ENTRY_NONE

 Should Thread 2 just ignore if q is NULL and let Thread1's operation to
 free the entry later?

 Also, please let me know if there is any recent fixes in this area.

 Thanks,
 Pradeep
 _______________________________________________
 Devel mailing list -- devel(a)lists.nfs-ganesha.org
 To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Devel] Re: Hitting a crash in mdcache_lru_cleanup_push.