Daniel,
I don't have any debug logs, and unfortunately I have not been able to reproduce it. I
still have the core, if there is anything else that might be useful to dump. The total
number of files on the share is less than 20, which makes it even more unusual that the
entry was reaped. It happened while copying a 10GB file (reading and writing the file
from the share). I only had 2 such copies in progress. The FSAL was returning
ERR_FSAL_DELAY as I was overloading the subsystem that the share lives on.
(gdb) print lru_state
$2 = {entries_hiwat = 500000, entries_used = 11, chunks_hiwat = 100000, chunks_used = 2,
fds_system_imposed = 400000,
fds_hard_limit = 396000, fds_hiwat = 360000, fds_lowat = 200000, futility = 0,
per_lane_work = 50,
biggest_window = 160000, prev_fd_count = 1, prev_time = 1573511968, fd_state = 0}
Would this fix have any bearing here. ( I'm running 2.7.6 so don't have this fix
)
https://github.com/nfs-ganesha/nfs-ganesha/commit/2f1f87143458d7564588d8f...
Thanks,
Vandana
On 11/13/19, 11:48 AM, "Daniel Gryniewicz" <dgryniew(a)redhat.com> wrote:
It looks like the entry was somehow reaped (the only way we ever set
LRU_ENTRY_NONE) while it was in use. This should not be possible, as
mdc_read_cb() takes a ref around this use. And, in fact, you can see
that the refcnt is 3, so it shouldn't be reaped. Nothing else should be
cleaning out the LRU fields. The rest of the fields in the entry look
fine, so it's probably a valid entry (and not, say, a use-after-free).
Do you have logs from this run? CACHE_INODE on FULL_DEBUG would be very
helpful.
Daniel
On 11/11/19 6:13 PM, Rungta, Vandana wrote:
NFS GANESHA 2.7.6
I am seeing the following crash when the FSAL returns a ERR_FSAL_DELAY
and mdcache_read_cb calls mdcache_kill_entry. The crash is in
mdcache_lru_cleanup_push while trying to do a LRU_DQ_SAFE(lru, q);
I am happy to provide any additional debug information needed from the
core.
(gdb) bt
#00x00000000005280e9 in mdcache_lru_cleanup_push (entry=0x1e2da00) at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:969
#10x000000000054585d in _mdcache_kill_entry (entry=0x1e2da00,
file=0x598080
"/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c", line=554,
function=0x598220 <__func__.20126> "mdc_read_cb") at
/src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3453
#20x0000000000536f7c in mdc_read_cb (obj=0x1d10900, ret=...,
obj_data=0x7f60be49ab60, caller_data=0x1e2dfc0)
at /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:554
#30x00007f60c65bb047 in foo_read2 (obj_hdl=0x1d10900, bypass=true,
done_cb=0x536e7b <mdc_read_cb>, read_arg=0x7f60be49ab60,
caller_arg=0x1e2dfc0)
at /opt/src/src/handle.c:1576
#40x000000000053706e in mdcache_read2 (obj_hdl=0x1e2da38, bypass=true,
done_cb=0x490cd7 <nfs3_read_cb>, read_arg=0x7f60be49ab60,
caller_arg=0x7f60be49afc0)
at /src/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:589
#50x00000000004914a7 in nfs3_read (arg=0x2f35728, req=0x2f35020,
res=0x1cd3d20) at /src/src/Protocols/NFS/nfs3_read.c:311
#60x0000000000457e56 in nfs_rpc_process_request (reqdata=0x2f35020) at
/src/src/MainNFSD/nfs_worker_thread.c:1328
#70x0000000000458615 in nfs_rpc_valid_NFS (req=0x2f35020) at
/src/src/MainNFSD/nfs_worker_thread.c:1548
#80x00007f60ca6d9034 in svc_vc_decode (req=0x2f35020) at
/src/src/libntirpc/src/svc_vc.c:829
#90x000000000044b005 in nfs_rpc_decode_request (xprt=0x1d25fc0,
xdrs=0x1d2be90) at /src/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#10 0x00007f60ca6d8f45 in svc_vc_recv (xprt=0x1d25fc0) at
/src/src/libntirpc/src/svc_vc.c:802
#11 0x00007f60ca6d5689 in svc_rqst_xprt_task (wpe=0x1d261d8) at
/src/src/libntirpc/src/svc_rqst.c:769
#12 0x00007f60ca6d5ae6 in svc_rqst_epoll_events (sr_rec=0x1c92990,
n_events=2) at /src/src/libntirpc/src/svc_rqst.c:941
#13 0x00007f60ca6d5d7b in svc_rqst_epoll_loop (sr_rec=0x1c92990) at
/src/src/libntirpc/src/svc_rqst.c:1014
#14 0x00007f60ca6d5e2e in svc_rqst_run_task (wpe=0x1c92990) at
/src/src/libntirpc/src/svc_rqst.c:1050
#15 0x00007f60ca6de7f6 in work_pool_thread (arg=0x1cec1c0) at
/src/src/libntirpc/src/work_pool.c:181
#16 0x00007f60c96fcde5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f60c9003f1d in clone () from /lib64/libc.so.6
(gdb) print entry->lru
$4 = {q = {next = 0x0, prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 3,
flags = 0, lane = 12, cf = 0}
(gdb) print *entry
$5 = {attr_lock = {__data = {__lock = 0, __nr_readers = 0,
__readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0,
__nr_writers_queued = 0,
__writer = 0, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0},
__size = '\000' <repeats 55 times>, __align = 0}, obj_handle =
{handles = {
next = 0x212fb88, prev = 0x7d6460 <MDCACHE+32>}, fs = 0x0, fsal =
0x7d6440 <MDCACHE>, obj_ops = 0x7d6598 <MDCACHE+344>, obj_lock =
{__data = {__lock = 0,
__nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0,
__nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 0,
__shared = 0, __pad1 = 0,
__pad2 = 0, __flags = 0}, __size = '\000' <repeats 55 times>, __align
= 0}, type = REGULAR_FILE, fsid = {major = 0, minor = 0}, fileid = 20004,
state_hdl = 0x1e2dca0}, sub_handle = 0x1d10900, attrs = {request_mask
= 1433550, valid_mask = 1433550, supported = 1433582, type = REGULAR_FILE,
filesize = 10737418240, fsid = {major = 0, minor = 0}, acl = 0x0,
fileid = 20004, mode = 420, numlinks = 1, owner = 65534, group =
65534, rawdev = {major = 0,
minor = 0}, atime = {tv_sec = 1573241243, tv_nsec = 33000000},
creation = {tv_sec = 0, tv_nsec = 0}, ctime = {tv_sec = 1573241243,
tv_nsec = 33000000},
mtime = {tv_sec = 1573241243, tv_nsec = 33000000}, chgtime = {tv_sec =
1573241243, tv_nsec = 33000000}, spaceused = 10737418240, change =
1573241243033000000,
generation = 0, expire_time_attr = 60, fs_locations = 0x0, sec_label =
{slai_lfs = {lfs_lfs = 0, lfs_pi = 0}, slai_data = {slai_data_len = 0,
slai_data_val = 0x0}}}, fh_hk = {node_k = {left = 0x0, right = 0x0,
parent = 2}, key = {hk = 12628871282545812592, fsal = 0x7f60c6801100
<FOO>, kv = {
addr = 0x1d0c930, len = 10}}, inavl = false}, mde_flags = 1, attr_time
= 1573511991, acl_time = 0, fs_locations_time = 0, lru = {q = {next = 0x0,
prev = 0x0}, qid = LRU_ENTRY_NONE, refcnt = 3, flags = 0, lane = 12,
cf = 0}, export_list = {next = 0x1d00a80, prev = 0x1d00a80},
first_export_id = 2,
content_lock = {__data = {__lock = 0, __nr_readers = 0,
__readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0,
__nr_writers_queued = 0,
__writer = 0, __shared = 0, __pad1 = 0, __pad2 = 0, __flags = 0},
__size = '\000' <repeats 55 times>, __align = 0}, fsobj = {hdl =
{state_lock = {__data = {
__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup =
0, __nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 0,
__shared = 0,
__pad1 = 0, __pad2 = 0, __flags = 0}, __size = '\000' <repeats 55
times>, __align = 0}, no_cleanup = false, {file = {obj = 0x1e2da38,
list_of_states = {
next = 0x1e2dce8, prev = 0x1e2dce8}, layoutrecall_list = {next =
0x1e2dcf8, prev = 0x1e2dcf8}, lock_list = {next = 0x1e2dd08, prev =
0x1e2dd08},
nlm_share_list = {next = 0x1e2dd18, prev = 0x1e2dd18}, write_delegated
= false, fdeleg_stats = {fds_curr_delegations = 0,
fds_deleg_type = OPEN_DELEGATE_NONE, fds_delegation_count = 0,
fds_recall_count = 0, fds_avg_hold = 0, fds_last_delegation = 0,
fds_last_recall = 0,
fds_num_opens = 0, fds_first_open = 0}, anon_ops = 0}, dir =
{junction_export = 0x1e2da38, export_roots = {next = 0x1e2dce8, prev =
0x1e2dce8},
exp_root_refcount = 31644920}}}, fsdir = {chunks = {next = 0x0, prev =
0x0}, detached = {next = 0x0, prev = 0x0}, spin = 0, detached_count =
0, dhdl = {
state_lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup
= 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued
= 0,
__writer = 31644216, __shared = 0, __pad1 = 31644904, __pad2 =
31644904, __flags = 31644920},
__size = '\000' <repeats 24 times>,
"\070\332\342\001\000\000\000\000\350\334\342\001\000\000\000\000\350\334\342\001\000\000\000\000\370\334\342\001\000\000\000",
__align = 0}, no_cleanup = 248, {file = {obj = 0x1e2dd08,
list_of_states = {next = 0x1e2dd08, prev = 0x1e2dd18},
layoutrecall_list = {next = 0x1e2dd18,
prev = 0x0}, lock_list = {next = 0x0, prev = 0x0}, nlm_share_list =
{next = 0x0, prev = 0x0}, write_delegated = false, fdeleg_stats = {
fds_curr_delegations = 0, fds_deleg_type = OPEN_DELEGATE_NONE,
fds_delegation_count = 0, fds_recall_count = 0, fds_avg_hold = 0,
fds_last_delegation = 0, fds_last_recall = 0, fds_num_opens = 0,
fds_first_open = 0}, anon_ops = 0}, dir = {junction_export = 0x1e2dd08,
export_roots = {next = 0x1e2dd08, prev = 0x1e2dd18}, exp_root_refcount
= 31644952}}}, parent = {addr = 0x0, len = 0}, parent_time = 0,
first_ck = 0,
avl = {t = {root = 0x0, cmp_fn = 0x0, height = 0, first = 0x0, last =
0x0, size = 0}, ck = {root = 0x0, cmp_fn = 0x0, height = 0, first =
0x0, last = 0x0,
size = 0}, sorted = {root = 0x0, cmp_fn = 0x0, height = 0, first =
0x0, last = 0x0, size = 0}, collisions = 0}}}}
(gdb)
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org