Hi Daniel,
I already have that fix (
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/441566) you mention
below. I believe this race is different. What I meant is something like
this may be needed; otherwise the chunk cleanup will end up using wrong
parent pointer.
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/450825
Thanks,
Pradeep
On Wed, Apr 10, 2019 at 11:31 AM Daniel Gryniewicz <dang(a)redhat.com> wrote:
This is probably 389f1f09da19c4f42046474af3869815b6bd5959 fixed in
2.7.2.
Daniel
On 4/10/19 1:55 PM, Pradeep wrote:
> Hello,
>
> I'm hitting hang where rmv_detached_dirent() is stuck at the spinlock
> for ever.
>
> #0 0x00007feb01a9a4e5 in pthread_spin_lock () from
/lib64/libpthread.so.0
> #1 0x00000000005519e7 in rmv_detached_dirent (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:420
> #2 0x00000000005522e8 in mdcache_avl_remove (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:256
> #3 0x0000000000547752 in mdcache_clean_dirent_chunk
(chunk=0x7fea9f86f970)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:454
> #4 0x00000000005386a2 in lru_clean_chunk (chunk=0x7fea9f86f970)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2078
> #5 0x000000000053881b in mdcache_lru_unref_chunk (chunk=0x7fea9f86f970)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2097
> #6 0x000000000053698c in chunk_lru_run_lane (lane=14)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1509
> #7 0x0000000000536d26 in chunk_lru_run (ctx=0x7feafe00f580)
> at
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1563
> #8 0x0000000000508ad9 in fridgethr_start_routine (arg=0x7feafe00f580)
> at /usr/src/debug/nfs-ganesha-2.7.1/support/fridgethr.c:550
> #9 0x00007feb01a95e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007feb0139dbad in clone () from /lib64/libc.so.6
>
> 420 pthread_spin_lock(&parent->fsobj.fsdir.spin);
> (gdb) print parent->fsobj.fsdir.spin
> $1 = -1
> (gdb) print parent->obj_handle.type
> $2 = REGULAR_FILE
>
> Looks like when the thread is in this path, the parent got reused for a
> different object in the filesystem. From looking at code, this seems
> possible:
>
> Lets say thread1 goes through the reuse path:
>
> * mdcache_lru_get() ->
> o mdcache_lru_clean() ->
> + mdc_clean_entry() ->
> # mdcache_dirent_invalidate_all()
> * mdcache_lru_unref_chunk()
> * This will call lru_clean_chunk() only if refcnt is
> zero. Lets say another thread (thread2 below)
> incremented it from the background thread. So
> thread1 will return back and the entry now gets
reused.
>
>
> Now the background thread (thread2) comes up to clean chunks and
> increments refcnt:
>
> * chunk_lru_run()
> o chunk_lru_run_lane()
> + mdcache_lru_unref_chunk()
> + Here we unlock qlane lock and then decrement refcnt. When
> this thread unlocks, thread1 will grab qlane lock and skip
> the chunk because refcnt is > 0. By the time we reach
> mdcache_lru_unref_chunk(), the parent would have been reused
> by thread1 for a different object. Now
> mdcache_lru_unref_chunk() make progress as refcnt became
> zero; but parent is invalid. So it gets stuck in
> rmv_detached_dirent().
>
> I think we should hold the qlock in chunk_lru_run_lane() till the refcnt
> is decremented to make sure that reaping of parent is not possible.
> Since mdcache_lru_unref_chunk() already holds the lock later, we could
> probably pass a flag to indicate if the qlane lock is already held or
> nor (similar to content_lock). Please let me know if there is a better
> approach to refactor this.
>
> Thanks,
> Pradeep
>
>
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org