June 2019 - Devel - Nfs Ganesha List Archives

Nfs Ganesha 2.7.4 ETA?

by Rungta, Vandana

Is there an ETA for Nfs-Ganesha 2.7.4? Thanks, Vandana

6 years, 1 month

5
6
0 / 0

Regarding POSIX ACL support

by zhu.shangzhong＠zte.com.cn

In the kernel nfs-server(Linux), the NFSACL sideband protocol is implemented to provide POSIX ACL for NFS v3. Currently, the NFSACL sideband protocol isn't supported in the nfs-ganesha server. We plan to implement this sideband protocol in the nfs-ganesha and push to nfs-ganesha community. Any suggestions?

6 years, 1 month

2
2
0 / 0

Change in ...nfs-ganesha[next]: MDCACHE - Release refs on dirents when chunk not consumed

by Daniel Gryniewicz (GerritHub)

Daniel Gryniewicz has uploaded this change for review. ( https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/456633 Change subject: MDCACHE - Release refs on dirents when chunk not consumed ...................................................................... MDCACHE - Release refs on dirents when chunk not consumed If we don't consume all of a newly loaded chunk, then the refs on the entries in that chunk are left over, and aren't cleared until the directory is invalidated. In this case, clear the refs on the entries when we bail early from a chunk. Change-Id: I61c6608898cb999c8440929504cf122b4bef232b Signed-off-by: Daniel Gryniewicz <dang(a)redhat.com> --- M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c 1 file changed, 30 insertions(+), 0 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/33/456633/1 -- To view, visit https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/456633 To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-Change-Id: I61c6608898cb999c8440929504cf122b4bef232b Gerrit-Change-Number: 456633 Gerrit-PatchSet: 1 Gerrit-Owner: Daniel Gryniewicz <dang(a)redhat.com> Gerrit-MessageType: newchange

6 years, 1 month

1
0
0 / 0

Clarification needed on avltree_remove function

by Sachin Punadikar

Hi All, Recently a customer reported below crash in avltree_remove function: (gdb) where #0 0x000000001017e590 in get_balance (node=0x0) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/avl/avl.c:84 #1 0x000000001017f8f0 in avltree_remove (node=0x3ffb2073fea0, tree=0x3ff1907b5210) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/avl/avl.c:552 #2 0x0000000010167c2c in cache_inode_release_dirents (entry=0x3ff1907b5070, which=CACHE_INODE_AVL_NAMES) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_misc.c:804 #3 0x0000000010167bd0 in cache_inode_release_dirents (entry=0x3ff1907b5070, which=CACHE_INODE_AVL_BOTH) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_misc.c:785 #4 0x00000000101716d4 in cache_inode_lru_clean (entry=0x3ff1907b5070) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_lru.c:417 #5 0x0000000010175984 in cache_inode_lru_get (entry=0x3fff4398cf18) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_lru.c:1207 #6 0x0000000010165754 in cache_inode_new_entry (new_obj=0x3ffbc8d478d0, flags=0, entry=0x3fff4398d0f0) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_misc.c:281 #7 0x0000000010161818 in cache_inode_get (fsdata=0x3fff4398d0f8, entry=0x3fff4398d0f0) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_get.c:256 #8 0x000000001019a5b8 in nfs3_FhandleToCache (fh3=0x3ff9cc000e70, status=0x3ff1d42ea310, rc=0x3fff4398d1d0) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/support/nfs_filehandle_mgmt.c:163 #9 0x000000001007f1e4 in nfs3_getattr (arg=0x3ff9cc000e70, req=0x3ff9cc000cb0, res=0x3ff1d42ea310) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/Protocols/NFS/nfs3_getattr.c:81 #10 0x000000001005f748 in nfs_rpc_execute (reqdata=0x3ff9cc000c80) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1288 #11 0x0000000010060498 in worker_run (ctx=0x1000e5a6b40) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/MainNFSD/nfs_worker_thread.c:1550 #12 0x00000000101ae5ec in fridgethr_start_routine (arg=0x1000e5a6b40) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/support/fridgethr.c:561 #13 0x00003fff9bd7c2bc in .start_thread () from /lib64/libpthread.so.0 #14 0x00003fff9bb9b304 in .__clone () from /lib64/libc.so.6 (gdb) frame 2 #2 0x0000000010167c2c in cache_inode_release_dirents (entry=0x3ff1907b5070, which=CACHE_INODE_AVL_NAMES) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/cache_inode/cache_inode_misc.c:804 804 avltree_remove(dirent_node, tree); (gdb) p *dirent $1 = {node_hk = {left = 0x0, right = 0x0, parent = 70347813813922}, hk = {k = 6039456505800813999, p = 0}, ckey = { hk = 16814622367593888045, fsal = 0x3fff996b25f8 <GPFS>, kv = {addr = 0x3ffd3aaecce0, len = 40}}, flags = 0, name = 0x3ffd39a9da8c "J2"} (gdb) p *tree $2 = {root = 0x3ffb2073fea0, cmp_fn = @0x10288238: 0x1016e834 <avl_dirent_hk_cmpf>, height = 1, first = 0x3ffb2073fea0, last = 0x3ff70fab2120, size = 2} (gdb) p *dirent_node $3 = {left = 0x0, right = 0x0, parent = 70347813813922} (gdb) frame 1 #1 0x000000001017f8f0 in avltree_remove (node=0x3ffb2073fea0, tree=0x3ff1907b5210) at /usr/src/debug/nfs-ganesha-2.3.2-ibm59-0.1.1-Source/avl/avl.c:552 552 switch (get_balance(right)) { (gdb) p *left Cannot access memory at address 0x0 (gdb) p *right Cannot access memory at address 0x0 (gdb) p *next Cannot access memory at address 0x0 (gdb) p *tree $4 = {root = 0x3ffb2073fea0, cmp_fn = @0x10288238: 0x1016e834 <avl_dirent_hk_cmpf>, height = 1, first = 0x3ffb2073fea0, last = 0x3ff70fab2120, size = 2} (gdb) p *parent $5 = {left = 0x3ffb2073fea0, right = 0x0, parent = 70319593422913} (gdb) p tree->root $6 = (struct avltree_node *) 0x3ffb2073fea0 (gdb) p *tree->root $7 = {left = 0x0, right = 0x0, parent = 70330352345380} (gdb) p balance $8 = 2 (gdb) p *node $9 = {left = 0x0, right = 0x0, parent = 70330352345380} When I checked the relevant code. I doubt on below code block: if (is_left) { is_left = parent && parent->left == node; balance = inc_balance(node); if (balance == 0) /* case 1 */ continue; if (balance == 1) /* case 2 */ return; *right = node->right; /* case 3 */* switch (get_balance(right)) { case 0: /* case 3.1 */ .... .... } The block is for left side of the node/subtree (is_left is true), and we are actually trying to refer right side of the node/tree (which may be NULL, actually crash is due to NULL in the next line) Does it mean that the block should be like : * if (! is_left) {* is_left = parent && parent->left == node; balance = inc_balance(node); if (balance == 0) /* case 1 */ continue; if (balance == 1) /* case 2 */ return; right = node->right; /* case 3 */ switch (get_balance(right)) { case 0: /* case 3.1 */ .... .... } Let me know if my understanding is incorrect. -- with regards, Sachin Punadikar

6 years, 1 month

2
2
0 / 0

Crash in _mdcache_lru_ref() when mdcache_readdir_chunked() passes a non-accessible 'entry'

by Madhu P Punjabi

6 years, 1 month

2
1
0 / 0

回复：Re: Cannot close Error with gluster FSAL

by QR

Hi Soumya, thanks for your help. The ganesha process is started by root, but the tar is executed by a non-root(I use qr here) user.The mount point is /home/glusterfs.Reproduce steps1. Prepare the tar file and mount point with root user cd /tmp/perm echo 444 > big1.hdr chown qr:qr big1.hdr tar czf 444.tgz big1.hdr chown qr:qr 444.tgz mkdir /home/glusterfs/perm && chown qr:qr /home/glusterfs/perm2. switch to non-root user: su qr3. cd /home/glusterfs/perm4. tar xzf /tmp/perm/444.tgz The ganesha and nfs client are on the same host.os version : CentOS Linux release 7.5.1804 (Core)kernel version: 3.10.0-693.17.1.el7.x86_64glusterfs api : glusterfs-api-devel-4.1.5-1.el7.x86_64 ERROR LOGS - ganesha.log01/06/2019 06:59:48 : ganesha.nfsd-7236[svc_2] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Permission deniedERROR LOGS - ganesha-gfapi.log[2019-05-31 22:59:48.613207] E [MSGID: 114031] [client-rpc-fops_v2.c:272:client4_0_open_cbk] 0-gv0-client-0: remote operation failed. Path: /perm/big1.hdr (c591c1bf-9494-4222-8aa1-deaaf0b44c7f) [Permission denied]ERROR LOGS - export-sdb1-brick.log[2019-05-31 22:59:48.388169] W [MSGID: 113117] [posix-metadata.c:569:posix_update_utime_in_mdata] 0-gv0-posix: posix utime set mdata failed on file[2019-05-31 22:59:48.399813] W [MSGID: 113117] [posix-metadata.c:569:posix_update_utime_in_mdata] 0-gv0-posix: posix utime set mdata failed on file [函数未实现][2019-05-31 22:59:48.403448] I [MSGID: 139001] [posix-acl.c:269:posix_acl_log_permit_denied] 0-gv0-access-control: client: CTX_ID:51affcdc-77ef-497d-aece-36a568671907-GRAPH_ID:0-PID:7236-HOST:nfs-ganesha-PC_NAME:gv0-client-0-RECON_NO:-0, gfid: c591c1bf-9494-4222-8aa1-deaaf0b44c7f, req(uid:1000,gid:1000,perm:2,ngrps:1), ctx(uid:1000,gid:1000,in-groups:1,perm:444,updated-fop:SETATTR, acl:-) [Permission denied][2019-05-31 22:59:48.403499] E [MSGID: 115070] [server-rpc-fops_v2.c:1442:server4_open_cbk] 0-gv0-server: 124: OPEN /perm/big1.hdr (c591c1bf-9494-4222-8aa1-deaaf0b44c7f), client: CTX_ID:51affcdc-77ef-497d-aece-36a568671907-GRAPH_ID:0-PID:7236-HOST:nfs-ganesha-PC_NAME:gv0-client-0-RECON_NO:-0, error-xlator: gv0-access-control [Permission denied] -------------------------------- ----- 原始邮件 ----- 发件人：Soumya Koduri <skoduri(a)redhat.com> 收件人：zhbingyin(a)sina.com, ganesha-devel <devel(a)lists.nfs-ganesha.org>, Frank Filz <ffilz(a)redhat.com> 主题：[NFS-Ganesha-Devel] Re: Cannot close Error with gluster FSAL 日期：2019年06月01日 01点35分 On 5/31/19 1:55 PM, Soumya Koduri wrote: > > > On 5/31/19 4:30 AM, QR wrote: >> We cannot decompress this file with gluster FSAL, but vfs FSAL and >> kernel nfs can. >> Is anyone know about this? >> Thanks in advance. >> >> [qr@nfs-ganesha perm]$ tar xzf /tmp/perm/444.tgz >> tar: big1.hdr: Cannot close: Permission denied >> tar: Exiting with failure status due to previous errors >> [qr@nfs-ganesha perm]$ tar tvf /tmp/perm/444.tgz >> -r--r--r-- qr/qr 4 2019-05-30 11:25 big1.hdr >> > > It could be similar to the issue mentioned in [1]. Will check and confirm. I couldn't reproduce this issue. What is the OS version of the server and client machines? Also please check if there are any errors in ganesha.log, ganesha-gfapi.log and brick logs. Most probably you are hitting the issues discussed in [1]. The problem is that unlike most of the other FSALs in FSAL_GLUSTER we switch to user credentials before performing any operations on the backend file system [these changes were done to be able to run nfs-ganesha by a non-root user] The side-effect is that though first time the fd is opened as part of NFSv4.x client OPEN call, NFS-ganesha server may need to re-open the file to get additional fds to perform certain other operations like COMMIT, LOCK/LEASE and glusterfs doesn't grant RW access to those fds (as expected). Not sure if there is a clean way of fixing it. In [1], the author tried to workaround the problem by switching to root user if the ganesha process ID is also root. That means this issue will still remain if the ganesha process is started by a non-root user. @Frank, any thoughts? Thanks, Soumya > > Thanks, > Soumya > > [1] https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/447012 > >> Ganesha server info >> ganesha version: a3c6fa39ce72682049391b7e094885a8c151b0c8(V2.8-rc1) >> FSAL : gluster >> nfs client info >> nfs version : nfs4.0 >> mount options : >> rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=XXX,local_lock=none,addr=YYY >> >> >> _______________________________________________ >> Devel mailing list -- devel(a)lists.nfs-ganesha.org >> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org >> > _______________________________________________ > Devel mailing list -- devel(a)lists.nfs-ganesha.org > To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org _______________________________________________ Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org

6 years, 1 month

2
1
0 / 0

Change in ...nfs-ganesha[next]: FSAL_CEPH : Fix inode reference leak

by PDVIAN (GerritHub)

PDVIAN has uploaded this change for review. ( https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/456591 Change subject: FSAL_CEPH : Fix inode reference leak ...................................................................... FSAL_CEPH : Fix inode reference leak The ganesha server is crashing in deconstruct_handle while dereferencing the inode. In this scenario, the cmount object is null which indicates we fail to dereference some inodes. In ceph_fsal_readdir we fail to dereference inodes for dir (self) and parent directory. Change-Id: Iae66d5053067e8b8eb10679d31302c7418b7e270 Signed-off-by: Prashant D <prashant.dhange(a)gmail.com> --- M src/FSAL/FSAL_CEPH/handle.c 1 file changed, 7 insertions(+), 0 deletions(-) git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/91/456591/1 -- To view, visit https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/456591 To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings Gerrit-Project: ffilz/nfs-ganesha Gerrit-Branch: next Gerrit-Change-Id: Iae66d5053067e8b8eb10679d31302c7418b7e270 Gerrit-Change-Number: 456591 Gerrit-PatchSet: 1 Gerrit-Owner: PDVIAN <prashant.dhange(a)gmail.com> Gerrit-MessageType: newchange

6 years, 1 month

1
0
0 / 0

Re: Regarding a lost RECLAIM_COMPLETE

by Jeff Layton

On Fri, 2019-05-31 at 10:08 +0000, Sriram Patil wrote: > Hi, > > Recently I came across an issue where NFS client lease expired so ganesha returned NFS4ERR_EXPIRED. This resulted in the client creating a new session with EXCHANGE_ID + CREATE_SESSION. > > The client id was immediately confirmed in CREATE_SESSION because the recov directory was not deleted. I observed that ganesha sets “cid_allow_reclaim = true” in nfs4_op_create_session->nfs4_chk_clid->nfs4_chk_clid_impl. This flags allows the client to do reclaims, even though ganesha is not in grace. CLAIM_PREVIOUS, in “open4_validate_reclaim” is as follows, > > case CLAIM_PREVIOUS: > want_grace = true; > if (!clientid->cid_allow_reclaim || > ((data->minorversion > 0) && > clientid->cid_cb.v41.cid_reclaim_complete)) > status = NFS4ERR_NO_GRACE; > break; > cid_allow_reclaim is just a flag saying that the client in question is present in the recovery DB. The logic above looks correct to me. > Now, there is another flag to mark the completion of reclaim from client “clientid->cid_cb.v41.cid_reclaim_complete”. This flag is set to true as part of RECLAIM_COMPLETE operation. Now, consider a case where ganesha does not receive RECLAIM_COMPLETE from the client and the CLAIM_NULL case in "open4_validate_reclaim", > > case CLAIM_NULL: > if ((data->minorversion > 0) > && !clientid->cid_cb.v41.cid_reclaim_complete) > status = NFS4ERR_GRACE; > break; > > So, the client gets stuck in a loop for OPEN with CLAIM_NULL, because it keeps returning NFS4ERR_GRACE. > A client that doesn't send a RECLAIM_COMPLETE before attempting to do a non-reclaim open is broken. RFC5661, page 567: Whenever a client establishes a new client ID and before it does the first non-reclaim operation that obtains a lock, it MUST send a RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no locks to reclaim. If non-reclaim locking operations are done before the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. So the above behavior is correct, IMO. > I guess allowing clients to reclaim as long as they keep sending the reclaim requests is the point of implementing sticky grace periods. But if RECLAIM_COMPLETE is lost we should not be stuck in grace period forever. May be we can change cid_allow_reclaim to the time at which last reclaim request was received. And then allow non-reclaim requests after (cid_allow_reclaim + grace_period), which means ganesha will wait for a RECLAIM_COMPLETE for a full grace period. We could choose the timeout to be grace_period/3 or something if that makes more sense. > The point of sticky grace periods was to ensure that we don't end up with a ToC/ToU race with the grace period. In general, we check whether we're in the grace period at the start of an operation, but we could end up lifting it or entering it after that check but before the operation was complete. With the sticky grace period patches, we ensure that we remain in whichever state we need until the operation is done. In general, this should not extend the length of the grace period unless you have an operation that is taking an extraordinarily long time before putting its reference. Maybe you have an operation that is stuck and holding a grace reference? > But this will ensure that SERVER does not fail because RECLAIM_COMPLETE was not sent. > > Meanwhile I am also trying to figure out why NFS client did not send RECLAIM_COMPLETE. > That's the real question. -- Jeff Layton <jlayton(a)redhat.com>

6 years, 1 month

2
1
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

Devel June 2019