[NFS-Ganesha-Devel] READDIR slowdown while write is in progress.

Saturday, 4 April 2020

If a client starts writing a large file, then the READDIR from that directory seems to be
stuck. Debugging further, it appears to be caused by a lock contention between FSAL merge
(vfs_merge()) and write or commit FSAL functions.

- Since write started first, OPEN would have created a new MDCACHE entry.
- As part of READDIR, MDCACHE tries to create a new entry for the same file, so
mdcache_new_entry() will reach obj_ops->merge().
  This seems to get stuck trying to get the obj_lock in write mode.
- Since client is sending multiple writes and commits, the readdir thread seems to be
stuck until it can get the write lock in vfs_merge().

Can be reproduced by the following commands (latest Ganesha, VFS FSAL):
Let's say, /gsh4 is the mount point

- mkdir /gsh4/dir.1
- touch  /gsh4/dir.1/file.{1..100}
- dd if=/dev/zero of=/gsh4/dir.1/largefile bs=1M count=100000 &
- ls /gsh4/dir.1

Note that any other client doing a READDIR will also be slow.

Is there a way this path can be optimized? Wouldn't share_counters be zero in readdir
path (meaning there isn't anything to merge)? 

Thanks,
Pradeep

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Devel] READDIR slowdown while write is in progress.