From: Daniel Gryniewicz [mailto:dang@redhat.com]
Sent: Thursday, April 11, 2019 5:28 AM
Okay, I misunderstood the situation. That fix won't work, because
mdcache_lru_unref_chunk() takes the qlane lock, and it also takes the content
lock, which must be taken before the qlane lock.
The problem here is that the parent has a pointer to the chunk and a ref on it,
and the chunk has a pointer to the parent but no ref on it.
This means that the parent refcount can go to zero while the chunk has a
pointer. I think we need to force remove all the chunks when we recycle the
parent, and null out the parent pointer in the chunk as well. We'll have to audit
all the uses of the parent pointer and make sure the call path has a ref to the
parent, or that it can handle a null parent pointer.
Thoughts? Ideas?
This sounds good. It looks like lru_reap_chunk_impl and mdcache_lru_unref_chunk need to be
able to handle a NULL parent pointer.
There are other functions that use chunk->parent, but have been passed in the parent.
Those functions maybe should assert that parent == chunk->parent (or entry ==
chunk->parent in the case of avl_dirent_set_deleted. mdcache_clean_dirent_chunk could
be passed the parent to facilitate this check (both places it's called already have
the parent). Note that function already sets chunk->parent = NULL.
Frank
On 4/10/19 5:40 PM, Pradeep wrote:
> Hi Daniel,
>
> I already have that fix
> (
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/441566) you mention
> below. I believe this race is different. What I meant is something
> like this may be needed; otherwise the chunk cleanup will end up using
> wrong parent pointer.
>
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/450825
>
> Thanks,
> Pradeep
>
> On Wed, Apr 10, 2019 at 11:31 AM Daniel Gryniewicz <dang(a)redhat.com
> <mailto:dang@redhat.com>> wrote:
>
> This is probably 389f1f09da19c4f42046474af3869815b6bd5959 fixed in
> 2.7.2.
>
> Daniel
>
> On 4/10/19 1:55 PM, Pradeep wrote:
> > Hello,
> >
> > I'm hitting hang where rmv_detached_dirent() is stuck at the
> spinlock
> > for ever.
> >
> > #0 0x00007feb01a9a4e5 in pthread_spin_lock () from
> /lib64/libpthread.so.0
> > #1 0x00000000005519e7 in rmv_detached_dirent
> (parent=0x7feaeb8dd600,
> > dirent=0x7fea9fbd8700)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:420
> > #2 0x00000000005522e8 in mdcache_avl_remove
(parent=0x7feaeb8dd600,
> > dirent=0x7fea9fbd8700)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:256
> > #3 0x0000000000547752 in mdcache_clean_dirent_chunk
> (chunk=0x7fea9f86f970)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:454
> > #4 0x00000000005386a2 in lru_clean_chunk (chunk=0x7fea9f86f970)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2078
> > #5 0x000000000053881b in mdcache_lru_unref_chunk
> (chunk=0x7fea9f86f970)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2097
> > #6 0x000000000053698c in chunk_lru_run_lane (lane=14)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1509
> > #7 0x0000000000536d26 in chunk_lru_run (ctx=0x7feafe00f580)
> > at
> >
> /usr/src/debug/nfs-ganesha-
2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1563
> > #8 0x0000000000508ad9 in fridgethr_start_routine
> (arg=0x7feafe00f580)
> > at /usr/src/debug/nfs-ganesha-2.7.1/support/fridgethr.c:550
> > #9 0x00007feb01a95e25 in start_thread () from /lib64/libpthread.so.0
> > #10 0x00007feb0139dbad in clone () from /lib64/libc.so.6
> >
> > 420 pthread_spin_lock(&parent->fsobj.fsdir.spin);
> > (gdb) print parent->fsobj.fsdir.spin
> > $1 = -1
> > (gdb) print parent->obj_handle.type
> > $2 = REGULAR_FILE
> >
> > Looks like when the thread is in this path, the parent got reused
> for a
> > different object in the filesystem. From looking at code, this seems
> > possible:
> >
> > Lets say thread1 goes through the reuse path:
> >
> > * mdcache_lru_get() ->
> > o mdcache_lru_clean() ->
> > + mdc_clean_entry() ->
> > # mdcache_dirent_invalidate_all()
> > * mdcache_lru_unref_chunk()
> > * This will call lru_clean_chunk() only if
> refcnt is
> > zero. Lets say another thread (thread2 below)
> > incremented it from the background thread. So
> > thread1 will return back and the entry now
> gets reused.
> >
> >
> > Now the background thread (thread2) comes up to clean chunks and
> > increments refcnt:
> >
> > * chunk_lru_run()
> > o chunk_lru_run_lane()
> > + mdcache_lru_unref_chunk()
> > + Here we unlock qlane lock and then decrement refcnt. When
> > this thread unlocks, thread1 will grab qlane lock and
> skip
> > the chunk because refcnt is > 0. By the time we reach
> > mdcache_lru_unref_chunk(), the parent would have been
> reused
> > by thread1 for a different object. Now
> > mdcache_lru_unref_chunk() make progress as refcnt became
> > zero; but parent is invalid. So it gets stuck in
> > rmv_detached_dirent().
> >
> > I think we should hold the qlock in chunk_lru_run_lane() till the
> refcnt
> > is decremented to make sure that reaping of parent is not possible.
> > Since mdcache_lru_unref_chunk() already holds the lock later, we
> could
> > probably pass a flag to indicate if the qlane lock is already
> held or
> > nor (similar to content_lock). Please let me know if there is a
> better
> > approach to refactor this.
> >
> > Thanks,
> > Pradeep
> >
> >
> >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> <mailto:devel@lists.nfs-ganesha.org>
> > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
> <mailto:devel-leave@lists.nfs-ganesha.org>
> >
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> <mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
> <mailto:devel-leave@lists.nfs-ganesha.org>
>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org