Fwd: Ganesha 2.5, crash /segfault while executing nlm4_Unlock
by Sachin Punadikar
---------- Forwarded message ----------
From: Sachin Punadikar <punadikar.sachin(a)gmail.com>
Date: Tue, Jun 26, 2018 at 3:57 PM
Subject: Ganesha 2.5, crash /segfault while executing nlm4_Unlock
To: nfs-ganesha-devel <nfs-ganesha-devel(a)lists.sourceforge.net>
Hi All,
Recently a crash was reported by customer for Ganesha 2.5.
(gdb) where
#0 0x00007f475872900b in pthread_rwlock_wrlock () from
/lib64/libpthread.so.0
#1 0x000000000041eac9 in fsal_obj_handle_fini (obj=0x7f4378028028) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL/commonlib.c:192
#2 0x000000000053180f in mdcache_lru_clean (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:589
#3 0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
flags=0, func=0x5a9380 <__func__.23209> "cih_remove_checked", line=406)
at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#4 0x0000000000543e91 in cih_remove_checked (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:406
#5 0x0000000000544b26 in mdc_clean_entry (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:235
#6 0x000000000053181e in mdcache_lru_clean (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:592
#7 0x0000000000536587 in _mdcache_lru_unref (entry=0x7f4378027ff0,
flags=0, func=0x5a70af <__func__.23112> "mdcache_put", line=190)
at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1921
#8 0x0000000000539666 in mdcache_put (entry=0x7f4378027ff0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.h:190
#9 0x000000000053f062 in mdcache_put_ref (obj_hdl=0x7f4378028028) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/FSAL
/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1709
#10 0x000000000049bf0f in nlm4_Unlock (args=0x7f4294165830,
req=0x7f4294165028, res=0x7f43f001e0e0)
at /usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/Prot
ocols/NLM/nlm_Unlock.c:128
#11 0x000000000044c719 in nfs_rpc_execute (reqdata=0x7f4294165000) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/Main
NFSD/nfs_worker_thread.c:1290
#12 0x000000000044cf23 in worker_run (ctx=0x3c200e0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/Main
NFSD/nfs_worker_thread.c:1562
#13 0x000000000050a3e7 in fridgethr_start_routine (arg=0x3c200e0) at
/usr/src/debug/nfs-ganesha-2.5.3-ibm013.00-0.1.1-Source/supp
ort/fridgethr.c:550
#14 0x00007f4758725dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f4757de673d in clone () from /lib64/libc.so.6
A closer look at the backtrace indicates that there was cyclic flow of
execution as below:
nlm4_Unlock -> mdcache_put_ref -> mdcache_put -> _mdcache_lru_unref ->
mdcache_lru_clean -> fsal_obj_handle_fini and then mdc_clean_entry ->
cih_remove_checked -> (purposely coping next flow on below line)
-> _mdcache_lru_unref -> mdcache_lru_clean -> fsal_obj_handle_fini
(currently crashing here)
Do we see any code issue here ? Any hints on how to RCA this issue ?
Thanks in advance.
--
with regards,
Sachin Punadikar
--
with regards,
Sachin Punadikar
6 years, 4 months
Announce Push of V2.7-rc4
by Frank Filz
Branch next
Tag:V2.7-rc4
Release Highlights
* Various grace period management fixes
* Fix uinitialized xdr_attrs_args when meeting error at nfs4 readdir
callback
* Fix uinitialized end_of_file/fsal_stable at nfs/9p
* Fix bad assign logical caused use after freed
* FreeBSD: fix compilation on FreeBSD 12
* FSAL_MDCACHE: remove limitation on Reaper_Work_Per_Lane
* Set op_ctx in state_blocked_lock_caller
* Set sbd_blocked_cookie correctly
* MDCACHE: fix a couple issues in readdir chunking
Signed-off-by: Frank S. Filz <ffilzlnx(a)mindspring.com>
Contents:
97f1b8b Frank S. Filz V2.7-rc4
b9c94ee Daniel Gryniewicz MDCACHE - Handle reloading first chunk
dd5e7c5 Daniel Gryniewicz MDCACHE - Take lock when bumping chunk
8bef6b0 Sachin Punadikar Set sbd_blocked_cookie correctly
cce90c8 Sachin Punadikar Set op_ctx in state_blocked_lock_caller
6b0b125 Fatih Acar FSAL_MDCACHE: remove limitation on Reaper_Work_Per_Lane
65df2a5 Fatih Acar FreeBSD: fix compilation on FreeBSD 12
44bc0ea Kinglong Mee Fix bad assign logical caused use after freed
222ab63 Kinglong Mee Fix uinitialized end_of_file/fsal_stable at nfs/9p
d25e4b5 Kinglong Mee Fix uinitialized xdr_attrs_args when meeting error at
nfs4 readdir callback
369d043 Jeff Layton rados_grace: move nfs_in_grace check to
nfs_maybe_start_grace
fcc50c4 Jeff Layton NFS: don't check for grace period in nfs4_chk_clid
a17c8d6 Jeff Layton NLM: don't check for a grace period in nlm4_Unshare
2872d04 Jeff Layton NLM: don't check for grace period in nlm4_Unlock
f7f45c4 Jeff Layton SAL: recovery_backend API cleanups
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: rados_grace: move nfs_in_grace check to nfs_maybe_start_grace
by GerritHub
From Jeff Layton <jlayton(a)redhat.com>:
Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/424267
Change subject: rados_grace: move nfs_in_grace check to nfs_maybe_start_grace
......................................................................
rados_grace: move nfs_in_grace check to nfs_maybe_start_grace
The maybe_start_grace routines should not need to do this themselves.
Change-Id: I1d6d08aeaf0c919be980569d584bf8d5bf87e3d8
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
M src/SAL/nfs4_recovery.c
M src/SAL/recovery/recovery_rados_cluster.c
2 files changed, 1 insertion(+), 5 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/67/424267/1
--
To view, visit https://review.gerrithub.io/424267
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1d6d08aeaf0c919be980569d584bf8d5bf87e3d8
Gerrit-Change-Number: 424267
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton <jlayton(a)redhat.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: MDCACHE - Handle reloading first chunk
by GerritHub
From Daniel Gryniewicz <dang(a)redhat.com>:
Daniel Gryniewicz has uploaded this change for review. ( https://review.gerrithub.io/424264
Change subject: MDCACHE - Handle reloading first chunk
......................................................................
MDCACHE - Handle reloading first chunk
When the first chunk needs to be reloaded, due to missing entries, we
need to remove the chunk and pass back the correct dirent.
Change-Id: I0fd4c519c7461f16e5e4699385a2a8d88ec11e25
Signed-off-by: Daniel Gryniewicz <dang(a)redhat.com>
---
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c
1 file changed, 5 insertions(+), 5 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/64/424264/1
--
To view, visit https://review.gerrithub.io/424264
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0fd4c519c7461f16e5e4699385a2a8d88ec11e25
Gerrit-Change-Number: 424264
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Gryniewicz <dang(a)redhat.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: Set sbd_blocked_cookie correctly
by GerritHub
From Sachin Punadikar <psachin(a)in.ibm.com>:
Sachin Punadikar has uploaded this change for review. ( https://review.gerrithub.io/424260
Change subject: Set sbd_blocked_cookie correctly
......................................................................
Set sbd_blocked_cookie correctly
In "state_add_grant_cookie", the sbd_blocked_cookie field
initialized in the begining. On failure of "do_lock_op", allocated
cookie is freed, but sbd_blocked_cookie value kept as is, leading
to double free condition. Also on freeing cookie, its entry from
hash table should be removed.
Change-Id: Ieabab40ffd06218b61d056541c0b08c297cd5dd4
Signed-off-by: Sachin Punadikar <psachin(a)in.ibm.com>
---
M src/SAL/state_lock.c
1 file changed, 9 insertions(+), 4 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/60/424260/1
--
To view, visit https://review.gerrithub.io/424260
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieabab40ffd06218b61d056541c0b08c297cd5dd4
Gerrit-Change-Number: 424260
Gerrit-PatchSet: 1
Gerrit-Owner: Sachin Punadikar <psachin(a)in.ibm.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: NFS: don't check for grace period in nfs4_chk_clid
by GerritHub
From Jeff Layton <jlayton(a)redhat.com>:
Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/424258
Change subject: NFS: don't check for grace period in nfs4_chk_clid
......................................................................
NFS: don't check for grace period in nfs4_chk_clid
We will validate cid_allow_reclaim flag at the point where we are
actually reclaiming. There are also some ways that we can go back into
grace and load a new recovery db after coming out of it with certain
dbus commands.
In that situation we could conceivably have a client establish a session
outside of the grace period and then issue reclaim requests against it
after the grace period has restarted (maybe a merging of state held by
two servers in the cluster?).
Drop this check from the code since we always check cid_allow_reclaim
in conjunction with grace period checks at the time of the reclaim.
Change-Id: I7cfd5ad6cd74c9358e9ec86710d0a31ad1ffa883
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
M src/SAL/nfs4_recovery.c
1 file changed, 0 insertions(+), 3 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/58/424258/1
--
To view, visit https://review.gerrithub.io/424258
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7cfd5ad6cd74c9358e9ec86710d0a31ad1ffa883
Gerrit-Change-Number: 424258
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton <jlayton(a)redhat.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: NLM: don't check for a grace period in nlm4_Unshare
by GerritHub
From Jeff Layton <jlayton(a)redhat.com>:
Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/424257
Change subject: NLM: don't check for a grace period in nlm4_Unshare
......................................................................
NLM: don't check for a grace period in nlm4_Unshare
Checking for the grace period is pointless. Unshare is releasing
state, and that can't cause a conflict.
It is true that an Unshare does get a reclaim flag in its arguments,
but I chalk that up to poor protocol design rather than some deeper
meaning.
Change-Id: Id84038b738db6ea2643af6eecc5b0eb01422b29c
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
M src/Protocols/NLM/nlm_Unshare.c
1 file changed, 0 insertions(+), 15 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/57/424257/1
--
To view, visit https://review.gerrithub.io/424257
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id84038b738db6ea2643af6eecc5b0eb01422b29c
Gerrit-Change-Number: 424257
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton <jlayton(a)redhat.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: NLM: don't check for grace period in nlm4_Unlock
by GerritHub
From Jeff Layton <jlayton(a)redhat.com>:
Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/424256
Change subject: NLM: don't check for grace period in nlm4_Unlock
......................................................................
NLM: don't check for grace period in nlm4_Unlock
There is no need to check for the grace period when processing an
unlock. The grace period exists to prevent clients from acquiring
conflicting state that other clients have yet to reclaim.
An unlock, however, can't cause a conflict since we're releasing state
that is genuinely held by the client.
Change-Id: Iad2f36b9579f868d7715636f43d1c2ce2479a12b
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
M src/Protocols/NLM/nlm_Unlock.c
1 file changed, 0 insertions(+), 8 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/56/424256/1
--
To view, visit https://review.gerrithub.io/424256
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iad2f36b9579f868d7715636f43d1c2ce2479a12b
Gerrit-Change-Number: 424256
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton <jlayton(a)redhat.com>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: FSAL_MDCACHE: remove limitation on Reaper_Work_Per_Lane
by GerritHub
From <fatih(a)gandi.net>:
fatih(a)gandi.net has uploaded this change for review. ( https://review.gerrithub.io/424229
Change subject: FSAL_MDCACHE: remove limitation on Reaper_Work_Per_Lane
......................................................................
FSAL_MDCACHE: remove limitation on Reaper_Work_Per_Lane
We encountered cases where the first 2000 entries in the LRU of each
lane are all referenced. In consequence the reaper thread does not
reap further unreferenced entries because it stops iterating too early.
Raising Reaper_Work_Per_Lane above 2k fixed the issue.
Change-Id: Ia1da12a21a42bf984c9f11424815080b45129a91
Signed-off-by: Fatih Acar <fatih.acar(a)gandi.net>
---
M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_read_conf.c
M src/config_samples/config.txt
M src/doc/man/ganesha-cache-config.rst
3 files changed, 3 insertions(+), 3 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/29/424229/1
--
To view, visit https://review.gerrithub.io/424229
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia1da12a21a42bf984c9f11424815080b45129a91
Gerrit-Change-Number: 424229
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <fatih(a)gandi.net>
6 years, 4 months
Change in ffilz/nfs-ganesha[next]: FSAL: eliminate fsal_grace boolean
by GerritHub
From Jeff Layton <jlayton(a)redhat.com>:
Jeff Layton has uploaded this change for review. ( https://review.gerrithub.io/424122
Change subject: FSAL: eliminate fsal_grace boolean
......................................................................
FSAL: eliminate fsal_grace boolean
The fsal_staticfsinfo_t struct has a boolean called fsal_grace that will
offload grace period handling to the FSAL. Nothing defaults to this
being on, but GPFS has an option that allows it to be enabled.
Removing this boolean simplifies the code significantly, and it's not
clear that it's actually being used. This patch removes it, under the
assumption that it's no longer useful.
Cc: Malahal Naineni <malahal(a)gmail.com>
Change-Id: Ic4af1f814e86175553a3a57eedc771b17947dee2
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
---
M src/FSAL/FSAL_GPFS/main.c
M src/FSAL/commonlib.c
M src/FSAL/fsal_config.c
M src/Protocols/NFS/nfs4_op_lock.c
M src/Protocols/NFS/nfs4_op_open.c
M src/Protocols/NLM/nlm_Lock.c
M src/Protocols/NLM/nlm_Share.c
M src/include/fsal_types.h
8 files changed, 5 insertions(+), 35 deletions(-)
git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/22/424122/1
--
To view, visit https://review.gerrithub.io/424122
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic4af1f814e86175553a3a57eedc771b17947dee2
Gerrit-Change-Number: 424122
Gerrit-PatchSet: 1
Gerrit-Owner: Jeff Layton <jlayton(a)redhat.com>
Gerrit-CC: Malahal <malahal(a)gmail.com>
6 years, 4 months