It should be safe. Testing the flags is atomic, so that part is safe,
and the worst that happens is we invalidate twice, so it's still correct.
Daniel
On 6/20/19 11:46 AM, Rungta, Vandana wrote:
Thanks. I will re-run the tests with this change.
A separate but related question:
In mdcache_readdir_chunked towards the beginning where mde_flags are tested for
MDCACHE_TRUST_CONTENT | MDCACHE_TRUST_DIR_CHUNKS, the test is done outside the lock. It
is possible that the flags indicate that the directory content can be trusted and while
waiting for a READ_LOCK, the thread holding the WRITE_LOCK can invalidate the content. Do
the flags need to be checked after obtaining the lock and when upgrading the lock from
read to write?
Perhaps something like:
diff --git a/third-party-src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c
b/third-party-src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c
index 9ab2e7b..d15ff52 100644
--- a/third-party-src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c
+++ b/third-party-src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c
@@ -2731,18 +2748,22 @@
test_mde_flags(directory, MDCACHE_TRUST_DIR_CHUNKS)
? "true" : "false");
+ PTHREAD_RWLOCK_rdlock(&directory->content_lock);
+ has_write = false;
+
/* Dirent's are being chunked; check to see if it needs updating */
if (!test_mde_flags(directory, MDCACHE_TRUST_CONTENT |
MDCACHE_TRUST_DIR_CHUNKS)) {
/* Clean out the existing entries in the directory. */
LogFullDebugAlt(COMPONENT_NFS_READDIR, COMPONENT_CACHE_INODE,
"Flushing invalid dirent cache");
+ PTHREAD_RWLOCK_unlock(&directory->content_lock);
PTHREAD_RWLOCK_wrlock(&directory->content_lock);
- mdcache_dirent_invalidate_all(directory);
+ if (!test_mde_flags(directory, MDCACHE_TRUST_CONTENT |
+ MDCACHE_TRUST_DIR_CHUNKS)) {
+ mdcache_dirent_invalidate_all(directory);
+ }
has_write = true;
- } else {
- PTHREAD_RWLOCK_rdlock(&directory->content_lock);
- has_write = false;
}
if (look_ck == 0) {
@@ -2778,6 +2783,10 @@
*/
PTHREAD_RWLOCK_unlock(&directory->content_lock);
PTHREAD_RWLOCK_wrlock(&directory->content_lock);
+ if (!test_mde_flags(directory, MDCACHE_TRUST_CONTENT |
+ MDCACHE_TRUST_DIR_CHUNKS)) {
+ mdcache_dirent_invalidate_all(directory);
+ }
has_write = true;
first_pass = true;
chunk = NULL;
@@ -2982,6 +2987,10 @@
PTHREAD_RWLOCK_unlock(&directory->content_lock);
PTHREAD_RWLOCK_wrlock(&directory->content_lock);
+ if (!test_mde_flags(directory, MDCACHE_TRUST_CONTENT |
+ MDCACHE_TRUST_DIR_CHUNKS)) {
+ mdcache_dirent_invalidate_all(directory);
+ }
has_write = true;
/* Dropping the content_lock may have
On 6/20/19, 7:02 AM, "Daniel Gryniewicz" <dang(a)redhat.com> wrote:
Thinking about this overnight, I think the problem may just be releasing
the ref outside the lock. This ref won't be the last one, but dropping
the last one is always done with a write lock, so even holding the read
lock will ensure this isn't the last ref dropped.
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/458832
Daniel
On 6/19/19 12:21 PM, Rungta, Vandana wrote:
> Daniel,
>
> It is very reproducible in my test setup and I am happy to run a version with
additional debugging and provide you the logs.
>
> Test setup:
> File share with 5 million files. (2500 files in 2000 directory)
> One thread doing a readdir over the entire listing.
> Multiple other threads doing a variety of create/write/read/delete for a set of
content.
> It seems to reproduce easier and more frequently when the test is running from
a Windows client.
>
> I looked at multiple cores - and all of them had a similar bt.
>
> Thanks,
> Vandana
>
> On 6/19/19, 9:06 AM, "Daniel Gryniewicz" <dang(a)redhat.com>
wrote:
>
> On 6/15/19 12:15 PM, Rungta, Vandana wrote:
> > Updated to the final version of
> >
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/457970
> > and also included
> >
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/458124/1
> >
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/458125/1
> >
> > Following crash:
> >
> > mdcache_lru_unref_chunk on line 3310 in mdcache_helper.c is called
after releasing the content lock on the directory. Does it need to be called with the
write lock held (based on the comment for _mdcache_lru_unref_chunk)? What about when
has_write is false?
>
> Yes and no. That unref shouldn't ever free the chunk, so it
doesn't
> really need the write lock (although for correctness it should be up
> above the unlock).
>
> The real problem here is that that's somehow the last ref on this
chunk,
> which it shouldn't be.
>
> Can you characterize the workload? We may need to add some debugging
> and get logs from your run, assuming I can't reproduce it.
>
>
> Daniel
>
> >
> > (gdb) bt
> > #0 0x00007f8777cd3277 in raise () from /lib64/libc.so.6
> > #1 0x00007f8777cd4968 in abort () from /lib64/libc.so.6
> > #2 0x00007f8777d15d97 in __libc_message () from /lib64/libc.so.6
> > #3 0x00007f8777d1e4f9 in _int_free () from /lib64/libc.so.6
> > #4 0x0000000000543eac in gsh_free (p=0x3ffd1a80) at
/src/src/include/abstract_mem.h:246
> > #5 0x0000000000544b1c in mdcache_avl_remove (parent=0x52c27e0,
dirent=0x3ffd1a80)
> > at /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:240
> > #6 0x0000000000539f93 in mdcache_clean_dirent_chunk
(chunk=0x3ac82990)
> > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:538
> > #7 0x000000000052aecd in lru_clean_chunk (chunk=0x3ac82990) at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2033
> > #8 0x000000000052b060 in _mdcache_lru_unref_chunk (chunk=0x3ac82990,
func=0x598a00 <__func__.23678> "mdcache_readdir_chunked",
> > line=3310) at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2062
> > #9 0x0000000000543a21 in mdcache_readdir_chunked
(directory=0x52c27e0, whence=0, dir_state=0x7f8771a50af0,
> > cb=0x4323ed <populate_dirent>, attrmask=0,
eod_met=0x7f8771a50feb)
> > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3310
> > #10 0x000000000053054c in mdcache_readdir (dir_hdl=0x52c2818,
whence=0x7f8771a50ad0, dir_state=0x7f8771a50af0,
> > cb=0x4323ed <populate_dirent>, attrmask=0,
eod_met=0x7f8771a50feb)
> > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:559
> >
> >
> > On 6/14/19, 11:23 AM, "Daniel Gryniewicz"
<dang(a)redhat.com> wrote:
> >
> > Can you update to a newer version of
> >
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/457970
I've made
> > several changes to it.
> >
> > Daniel
> >
> > On 6/14/19 12:16 AM, Rungta, Vandana wrote:
> > > Nfs-ganesha V2.7.4 with the following MDCACHE patches
applied:
> > >
> > >
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/457970
> > >
> > >
https://github.com/nfs-ganesha/nfs-ganesha/commit/c98ad1e238f5db9db5ab8db...
> > >
> > >
https://github.com/nfs-ganesha/nfs-ganesha/commit/136df4f262c3f9bc29df456...
> > >
> > > Crash when freeing a dirent.
> > >
> > > (gdb) bt
> > >
> > > #00x00007fe0a2084277 in raise () from /lib64/libc.so.6
> > >
> > > #10x00007fe0a2085968 in abort () from /lib64/libc.so.6
> > >
> > > #20x00007fe0a20c6d97 in __libc_message () from
/lib64/libc.so.6
> > >
> > > #30x00007fe0a20cf4f9 in _int_free () from /lib64/libc.so.6
> > >
> > > #40x0000000000543e9c in gsh_free (p=0x1b72140) at
> > > /src/src/include/abstract_mem.h:246
> > >
> > > #50x0000000000544b0c in mdcache_avl_remove
(parent=0x1b17000,
> > > dirent=0x1b72140)
> > >
> > > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:240
> > >
> > > #60x0000000000545e4d in mdcache_avl_clean_trees
(parent=0x1b17000)
> > >
> > > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_avl.c:614
> > >
> > > #70x000000000053a11b in mdcache_dirent_invalidate_all
(entry=0x1b17000)
> > >
> > > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:606
> > >
> > > #80x0000000000541b15 in mdcache_readdir_chunked
(directory=0x1b17000,
> > > whence=5012127, dir_state=0x7fe097debaf0,
> > >
> > > cb=0x4323ed <populate_dirent>, attrmask=0,
eod_met=0x7fe097debfeb)
> > >
> > > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:2864
> > >
> > > #90x000000000053054c in mdcache_readdir
(dir_hdl=0x1b17038,
> > > whence=0x7fe097debad0, dir_state=0x7fe097debaf0,
> > >
> > > cb=0x4323ed <populate_dirent>, attrmask=0,
eod_met=0x7fe097debfeb)
> > >
> > > at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:559
> > >
> > > #10 0x0000000000432d14 in fsal_readdir
(directory=0x1b17038,
> > > cookie=5012127, nbfound=0x7fe097debfec,
> > >
> > > eod_met=0x7fe097debfeb, attrmask=0, cb=0x491d35
<nfs3_readdir_callback>,
> > > opaque=0x7fe097debfa0)
> > >
> > > at /src/src/FSAL/fsal_helper.c:1164
> > >
> > > #11 0x0000000000491b1d in nfs3_readdir (arg=0x1e12508,
req=0x1e11e00,
> > > res=0x1be41e0)
> > >
> > > at /src/src/Protocols/NFS/nfs3_readdir.c:289
> > >
> > > #12 0x0000000000457c16 in nfs_rpc_process_request
(reqdata=0x1e11e00) at
> > > /src/src/MainNFSD/nfs_worker_thread.c:1328
> > >
> > > #13 0x00000000004583d5 in nfs_rpc_valid_NFS (req=0x1e11e00)
at
> > > /src/src/MainNFSD/nfs_worker_thread.c:1548
> > >
> > > #14 0x00007fe0a3821034 in svc_vc_decode (req=0x1e11e00) at
> > > /src/src/libntirpc/src/svc_vc.c:829
> > >
> > > #15 0x000000000044adc5 in nfs_rpc_decode_request
(xprt=0x1b040e0,
> > > xdrs=0x1be4000)
> > >
> > > at /src/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
> > >
> > > #16 0x00007fe0a3820f45 in svc_vc_recv (xprt=0x1b040e0) at
> > > /src/src/libntirpc/src/svc_vc.c:802
> > >
> > > #17 0x00007fe0a381d689 in svc_rqst_xprt_task
(wpe=0x1b042f8) at
> > > /src/src/libntirpc/src/svc_rqst.c:769
> > >
> > > #18 0x00007fe0a381dae6 in svc_rqst_epoll_events
(sr_rec=0x1ae45f0,
> > > n_events=1)
> > >
> > > at /src/src/libntirpc/src/svc_rqst.c:941
> > >
> > > #19 0x00007fe0a381dd7b in svc_rqst_epoll_loop
(sr_rec=0x1ae45f0) at
> > > /src/src/libntirpc/src/svc_rqst.c:1014
> > >
> > > ---Type <return> to continue, or q <return> to
quit---q
> > >
> > > Quit
> > >
> > > (gdb) select-frame 5
> > >
> > > (gdb) info args
> > >
> > > parent = 0x1b17000
> > >
> > > dirent = 0x1b72140
> > >
> > > (gdb) info locals
> > >
> > > chunk = 0x1b4b510
> > >
> > > __func__ = "mdcache_avl_remove"
> > >
> > > (gdb) print *chunk
> > >
> > > $1 = {chunks = {next = 0x1b172a0, prev = 0x1b172a0},
dirents = {next =
> > > 0x1b29930, prev = 0x1bc2840},
> > >
> > > parent = 0x1b17000, chunk_lru = {q = {next = 0x0, prev =
0x0}, qid =
> > > LRU_ENTRY_L1, refcnt = 0, flags = 0,
> > >
> > > lane = 16, cf = 0}, reload_ck = 5012127, next_ck = 0,
num_entries = 297}
> > >
> > > (gdb) print *dirent
> > >
> > > $2 = {chunk_list = {next = 0x0, prev = 0x0}, chunk = 0x0,
node_name =
> > > {left = 0x0, right = 0x0, parent = 28852730},
> > >
> > > node_ck = {left = 0x1b78ff0, right = 0x1b68970, parent =
28718322},
> > > node_sorted = {left = 0x0, right = 0x0,
> > >
> > > parent = 0}, ck = 5019809, eod = false, namehash =
6207075547366218433,
> > > ckey = {hk = 15362064496009940491,
> > >
> > > fsal = 0x7fe09f949080 <FOO>, kv = {addr = 0x0, len =
0}}, flags = 0,
> > > entry = 0x0, name = 0x1b721f0 "random.37",
> > >
> > > name_buffer = 0x1b721f0 "random.37"}
> > >
> > > (gdb)
> > >
> > > I am happy to provide any additional debug info you want
from the core.
> > >
> > > Thanks,
> > >
> > > Vandana
> > >
> > >
> > > _______________________________________________
> > > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> > > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
> > >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
> >
> >
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>
>