On Thu, Apr 11, 2019 at 9:36 AM Soumya Koduri <skoduri@redhat.com> wrote:

Hi Pradeep,

Could you please share some details about your workload and in what
scenarios you have hit this race? Are there any cache tuning parameters
modified?

Hi Soumya,

I don't have a reproducer for this yet. The customer environment where this is hit is using NFS for storing logs from 100s of clients and few clients doing cleanup of old logs (find + rm using mtime) in parallel. This is with default ganesha parameters. From lru_state, I noticed we were over the default limit of 100K entries in MDCACHE (that is why the reap code path is triggered). This is with version 2.7.1 on a posix filesystem.

Thanks,

Pradeep

Thanks,
Soumya

On 4/10/19 11:25 PM, Pradeep wrote:
> Hello,
>
> I'm hitting hang where rmv_detached_dirent() is stuck at the spinlock
> for ever.
>
> #0 0x00007feb01a9a4e5 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1 0x00000000005519e7 in rmv_detached_dirent (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:420
> #2 0x00000000005522e8 in mdcache_avl_remove (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:256
> #3 0x0000000000547752 in mdcache_clean_dirent_chunk (chunk=0x7fea9f86f970)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:454
> #4 0x00000000005386a2 in lru_clean_chunk (chunk=0x7fea9f86f970)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2078
> #5 0x000000000053881b in mdcache_lru_unref_chunk (chunk=0x7fea9f86f970)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2097
> #6 0x000000000053698c in chunk_lru_run_lane (lane=14)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1509
> #7 0x0000000000536d26 in chunk_lru_run (ctx=0x7feafe00f580)
>     at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1563
> #8 0x0000000000508ad9 in fridgethr_start_routine (arg=0x7feafe00f580)
>     at /usr/src/debug/nfs-ganesha-2.7.1/support/fridgethr.c:550
> #9 0x00007feb01a95e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007feb0139dbad in clone () from /lib64/libc.so.6
>
> 420             pthread_spin_lock(&parent->fsobj.fsdir.spin);
> (gdb) print parent->fsobj.fsdir.spin
> $1 = -1
> (gdb) print parent->obj_handle.type
> $2 = REGULAR_FILE
>
> Looks like when the thread is in this path, the parent got reused for a
> different object in the filesystem. From looking at code, this seems
> possible:
>
> Lets say thread1 goes through the reuse path:
>
> * mdcache_lru_get() ->
> o mdcache_lru_clean() ->
> + mdc_clean_entry() ->
> # mdcache_dirent_invalidate_all()
> * mdcache_lru_unref_chunk()
> * This will call lru_clean_chunk() only if refcnt is
> zero. Lets say another thread (thread2 below)
> incremented it from the background thread. So
> thread1 will return back and the entry now gets reused.
>
>
> Now the background thread (thread2) comes up to clean chunks and
> increments refcnt:
>
> * chunk_lru_run()
> o chunk_lru_run_lane()
> + mdcache_lru_unref_chunk()
> + Here we unlock qlane lock and then decrement refcnt. When
> this thread unlocks, thread1 will grab qlane lock and skip
> the chunk because refcnt is > 0. By the time we reach
> mdcache_lru_unref_chunk(), the parent would have been reused
> by thread1 for a different object. Now
> mdcache_lru_unref_chunk() make progress as refcnt became
> zero; but parent is invalid. So it gets stuck in
> rmv_detached_dirent().
>
> I think we should hold the qlock in chunk_lru_run_lane() till the refcnt
> is decremented to make sure that reaping of parent is not possible.
> Since mdcache_lru_unref_chunk() already holds the lock later, we could
> probably pass a flag to indicate if the qlane lock is already held or
> nor (similar to content_lock). Please let me know if there is a better
> approach to refactor this.
>
> Thanks,
> Pradeep
>
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>