On Thu, Apr 11, 2019 at 9:36 AM Soumya Koduri <skoduri@redhat.com> wrote:
Hi Pradeep,

Could you please share some details about your workload and in what
scenarios you have hit this race? Are there any cache tuning parameters
modified?

Hi Soumya,

I don't have a reproducer for this yet. The customer environment where this is hit is using NFS for storing logs from 100s of clients and few clients doing cleanup of old logs (find + rm using mtime) in parallel. This is with default ganesha parameters. From lru_state, I noticed we were over the default limit of 100K entries in MDCACHE (that is why the reap code path is triggered). This is with version 2.7.1 on a posix filesystem.

Thanks,
Pradeep
 
Thanks,
Soumya

On 4/10/19 11:25 PM, Pradeep wrote:
> Hello,
>
> I'm hitting hang where rmv_detached_dirent() is stuck at the spinlock
> for ever.
>
> #0  0x00007feb01a9a4e5 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1  0x00000000005519e7 in rmv_detached_dirent (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:420
> #2  0x00000000005522e8 in mdcache_avl_remove (parent=0x7feaeb8dd600,
> dirent=0x7fea9fbd8700)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:256
> #3  0x0000000000547752 in mdcache_clean_dirent_chunk (chunk=0x7fea9f86f970)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:454
> #4  0x00000000005386a2 in lru_clean_chunk (chunk=0x7fea9f86f970)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2078
> #5  0x000000000053881b in mdcache_lru_unref_chunk (chunk=0x7fea9f86f970)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2097
> #6  0x000000000053698c in chunk_lru_run_lane (lane=14)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1509
> #7  0x0000000000536d26 in chunk_lru_run (ctx=0x7feafe00f580)
>      at
> /usr/src/debug/nfs-ganesha-2.7.1/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1563
> #8  0x0000000000508ad9 in fridgethr_start_routine (arg=0x7feafe00f580)
>      at /usr/src/debug/nfs-ganesha-2.7.1/support/fridgethr.c:550
> #9  0x00007feb01a95e25 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007feb0139dbad in clone () from /lib64/libc.so.6
>
> 420             pthread_spin_lock(&parent->fsobj.fsdir.spin);
> (gdb) print parent->fsobj.fsdir.spin
> $1 = -1
> (gdb) print parent->obj_handle.type
> $2 = REGULAR_FILE
>
> Looks like when the thread is in this path, the parent got reused for a
> different object in the filesystem. From looking at code, this seems
> possible:
>
> Lets say thread1 goes through the reuse path:
>
>   * mdcache_lru_get() ->
>       o mdcache_lru_clean() ->
>           + mdc_clean_entry() ->
>               # mdcache_dirent_invalidate_all()
>                   * mdcache_lru_unref_chunk()
>                   * This will call lru_clean_chunk() only if refcnt is
>                     zero. Lets say another thread (thread2 below)
>                     incremented it from the background thread. So
>                     thread1 will return back and the entry now gets reused.
>
>
> Now the background thread (thread2) comes up to clean chunks and
> increments refcnt:
>
>   * chunk_lru_run()
>       o chunk_lru_run_lane()
>           + mdcache_lru_unref_chunk()
>           + Here we unlock qlane lock and then decrement refcnt. When
>             this thread unlocks, thread1 will grab qlane lock and skip
>             the chunk because refcnt is > 0. By the time we reach
>             mdcache_lru_unref_chunk(), the parent would have been reused
>             by thread1 for a different object. Now
>             mdcache_lru_unref_chunk() make progress as refcnt became
>             zero; but parent is invalid. So it gets stuck in
>             rmv_detached_dirent().
>
> I think we should hold the qlock in chunk_lru_run_lane() till the refcnt
> is decremented to make sure that reaping of parent is not possible.
> Since mdcache_lru_unref_chunk() already holds the lock later, we could
> probably pass a flag to indicate if the qlane lock is already held or
> nor (similar to content_lock). Please let me know if there is a better
> approach to refactor this.
>
> Thanks,
> Pradeep
>
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>