On 4/12/19 6:05 AM, Pradeep wrote:
On Thu, Apr 11, 2019 at 9:36 AM Soumya Koduri <skoduri(a)redhat.com
<mailto:skoduri@redhat.com>> wrote:
Hi Pradeep,
Could you please share some details about your workload and in what
scenarios you have hit this race? Are there any cache tuning parameters
modified?
Hi Soumya,
I don't have a reproducer for this yet. The customer environment where
this is hit is using NFS for storing logs from 100s of clients and few
clients doing cleanup of old logs (find + rm using mtime) in parallel.
This is with default ganesha parameters. From lru_state, I noticed we
were over the default limit of 100K entries in MDCACHE (that is why the
reap code path is triggered). This is with version 2.7.1 on a posix
filesystem.
Okay. Thanks for sharing the details.
-Soumya
> Thanks,
> Pradeep
>
> Thanks,
> Soumya
>
> On 4/10/19 11:25 PM, Pradeep wrote:
> > Hello,
> >
> > I'm hitting hang where rmv_detached_dirent() is stuck at the
> spinlock
> > for ever.
> >
> > #0 0x00007feb01a9a4e5 in pthread_spin_lock () from
> /lib64/libpthread.so.0
> > #1 0x00000000005519e7 in rmv_detached_dirent
> (parent=0x7feaeb8dd600,
> > dirent=0x7fea9fbd8700)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:420
> > #2 0x00000000005522e8 in mdcache_avl_remove (parent=0x7feaeb8dd600,
> > dirent=0x7fea9fbd8700)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:256
> > #3 0x0000000000547752 in mdcache_clean_dirent_chunk
> (chunk=0x7fea9f86f970)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:454
> > #4 0x00000000005386a2 in lru_clean_chunk (chunk=0x7fea9f86f970)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2078
> > #5 0x000000000053881b in mdcache_lru_unref_chunk
> (chunk=0x7fea9f86f970)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2097
> > #6 0x000000000053698c in chunk_lru_run_lane (lane=14)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1509
> > #7 0x0000000000536d26 in chunk_lru_run (ctx=0x7feafe00f580)
> > at
> >
>
/usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1563
> > #8 0x0000000000508ad9 in fridgethr_start_routine
> (arg=0x7feafe00f580)
> > at /usr/src/debug/nfs-ganesha-2.7.1/support/fridgethr.c:550
> > #9 0x00007feb01a95e25 in start_thread () from /lib64/libpthread.so.0
> > #10 0x00007feb0139dbad in clone () from /lib64/libc.so.6
> >
> > 420 pthread_spin_lock(&parent->fsobj.fsdir.spin);
> > (gdb) print parent->fsobj.fsdir.spin
> > $1 = -1
> > (gdb) print parent->obj_handle.type
> > $2 = REGULAR_FILE
> >
> > Looks like when the thread is in this path, the parent got reused
> for a
> > different object in the filesystem. From looking at code, this seems
> > possible:
> >
> > Lets say thread1 goes through the reuse path:
> >
> > * mdcache_lru_get() ->
> > o mdcache_lru_clean() ->
> > + mdc_clean_entry() ->
> > # mdcache_dirent_invalidate_all()
> > * mdcache_lru_unref_chunk()
> > * This will call lru_clean_chunk() only if
> refcnt is
> > zero. Lets say another thread (thread2 below)
> > incremented it from the background thread. So
> > thread1 will return back and the entry now
> gets reused.
> >
> >
> > Now the background thread (thread2) comes up to clean chunks and
> > increments refcnt:
> >
> > * chunk_lru_run()
> > o chunk_lru_run_lane()
> > + mdcache_lru_unref_chunk()
> > + Here we unlock qlane lock and then decrement refcnt. When
> > this thread unlocks, thread1 will grab qlane lock and
> skip
> > the chunk because refcnt is > 0. By the time we reach
> > mdcache_lru_unref_chunk(), the parent would have been
> reused
> > by thread1 for a different object. Now
> > mdcache_lru_unref_chunk() make progress as refcnt became
> > zero; but parent is invalid. So it gets stuck in
> > rmv_detached_dirent().
> >
> > I think we should hold the qlock in chunk_lru_run_lane() till the
> refcnt
> > is decremented to make sure that reaping of parent is not possible.
> > Since mdcache_lru_unref_chunk() already holds the lock later, we
> could
> > probably pass a flag to indicate if the qlane lock is already
> held or
> > nor (similar to content_lock). Please let me know if there is a
> better
> > approach to refactor this.
> >
> > Thanks,
> > Pradeep
> >
> >
> >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> <mailto:devel@lists.nfs-ganesha.org>
> > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
> <mailto:devel-leave@lists.nfs-ganesha.org>
> >
>
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>