Fwd: Ganesha crash in lock_avail
by Sachin Punadikar
---------- Forwarded message ---------
From: Sachin Punadikar <punadikar.sachin(a)gmail.com>
Date: Thu, Dec 6, 2018 at 7:52 PM
Subject: Ganesha crash in lock_avail
To: nfs-ganesha-devel <nfs-ganesha-devel(a)lists.sourceforge.net>
Hello,
Customer reported below crash:
(gdb) where
#0 0x00007fa70c161fcb in raise () from /lib64/libpthread.so.0
#1 0x0000000000454884 in crash_handler (signo=11, info=0x7fa5a1ff9f30,
ctx=0x7fa5a1ff9e00)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/MainNFSD/nfs_init.c:225
#2 <signal handler called>
#3 0x0000000000000000 in ?? ()
#4 0x0000000000435084 in lock_avail (vec=0x18f07c8, file=0x7fa420157fd8,
owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_top.c:179
#5 0x00000000005386eb in mdc_up_lock_avail (vec=0x18f07c8,
file=0x7fa420157fd8, owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:380
#6 0x0000000000439c72 in queue_lock_avail (ctx=0x7fa40c039c40)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_async.c:247
#7 0x000000000050a32c in fridgethr_start_routine (arg=0x7fa40c039c40)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/support/fridgethr.c:550
#8 0x00007fa70c15adc5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fa70b81a1cd in clone () from /lib64/libc.so.6
It was found that op_ctx was not proper.
(gdb) frame 4
#4 0x0000000000435084 in lock_avail (vec=0x18f07c8, file=0x7fa420157fd8,
owner=0x7fa4f8189fc0,
lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL_UP/fsal_up_top.c:179
179 obj->obj_ops.put_ref(obj);
(gdb) p *obj
$2 = {handles = {next = 0x0, prev = 0x0}, fs = 0x193e240, fsal = 0x0,
obj_ops = {get_ref = 0x0,
put_ref = 0x0, release = 0x0, merge = 0x0, lookup = 0x0, readdir = 0x0,
compute_readdir_cookie = 0x0, dirent_cmp = 0x0, create = 0x0, mkdir =
0x0, mknode = 0x0,
symlink = 0x0, readlink = 0x0, test_access = 0x0, getattrs = 0x0,
setattrs = 0x0, link = 0x0,
fs_locations = 0x0, rename = 0x0, unlink = 0x0, open = 0x0, reopen =
0x0, status = 0x0,
read = 0x0, read_plus = 0x0, write = 0x0, write_plus = 0x0, seek = 0x0,
io_advise = 0x0,
commit = 0x0, lock_op = 0x0, share_op = 0x0, close = 0x0,
list_ext_attrs = 0x0,
getextattr_id_by_name = 0x0, getextattr_value_by_name = 0x0,
getextattr_value_by_id = 0x0,
setextattr_value = 0x0, setextattr_value_by_id = 0x0,
remove_extattr_by_id = 0x0,
remove_extattr_by_name = 0x0, handle_is = 0x0, handle_to_wire = 0x0,
handle_to_key = 0x0,
handle_cmp = 0x0, layoutget = 0x0, layoutreturn = 0x0, layoutcommit =
0x0, getxattrs = 0x0,
setxattrs = 0x0, removexattrs = 0x0, listxattrs = 0x0, open2 = 0x0,
check_verifier = 0x0,
status2 = 0x0, reopen2 = 0x0, read2 = 0x0, write2 = 0x0, seek2 = 0x0,
io_advise2 = 0x0,
commit2 = 0x0, lock_op2 = 0x0, setattr2 = 0x0, close2 = 0x0}, obj_lock
= {__data = {
__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup =
0,
__nr_readers_queued = 0, __nr_writers_queued = 0, __writer = 0,
__shared = 0, __pad1 = 0,
__pad2 = 0, __flags = 0}, __size = '\000' <repeats 55 times>, __align
= 0},
type = REGULAR_FILE, fsid = {major = 11073324921844891658, minor = 1},
fileid = 229392385,
state_hdl = 0x7fa51006aea0}
(gdb) frame 5
#5 0x00000000005386eb in mdc_up_lock_avail (vec=0x18f07c8,
file=0x7fa420157fd8,
owner=0x7fa4f8189fc0, lock_param=0x7fa420157ff0)
at
/usr/src/debug/nfs-ganesha-2.5.3-ibm015.01-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_up.c:380
380 rc = myself->super_up_ops.lock_avail(vec, file, owner,
(gdb) p op_ctx
$3 = (struct req_op_context *) 0x7fa5a1ffa430
(gdb) p *op_ctx
$4 = {creds = 0x0, original_creds = {caller_uid = 0, caller_gid = 0,
caller_glen = 0,
caller_garray = 0x0}, caller_gdata = 0x0, caller_garray_copy = 0x0,
managed_garray_copy = 0x0,
cred_flags = 0, caller_addr = 0x0, clientid = 0x0, nfs_vers = 0,
nfs_minorvers = 0,
req_type = 0, client = 0x0, ctx_export = 0x18efc78, fsal_export =
0x18f0680, export_perms = 0x0,
start_time = 0, queue_wait = 0, fsal_private = 0x0, fsal_module = 0x0,
fsal_pnfs_ds = 0x0}
(gdb)
In the above it shows that op_ctx is not set properly. "fsal_module" is
NULL.
To fix this issue I have posted a patch.
https://review.gerrithub.io/#/c/436356/
--
with regards,
Sachin Punadikar
--
with regards,
Sachin Punadikar
6 years
Re: NFS Ganesha w/ KRB5
by Simon Nussbaum
Dear Tom
We have it running here and this is our ganesha configuration:
NFS_KRB5
{
PrincipalName = nfs(a)myhost.mydomain.com ;
KeytabPath = /etc/krb5.keytab ;
Active_krb5 = YES ;
}
NFS_CORE_PARAM {
NSM_Use_Caller_Name = true;
Clustered = false;
Rquota_Port = 875;
}
EXPORT
{
Export_Id = 1;
Path = "/srv/shares";
Pseudo = "/srv/shares";
Protocols = "4";
Access_Type = RW;
Squash = no_root_squash;
# Squash = root_squash;
Sectype = krb5,krb5i,krb5p;
Disable_ACL = FALSE;
FSAL
{
name = GLUSTER;
hostname = "myglusterserver.mydomain.com";
volume = "myglustervolume";
}
}
But we have issues with NFSv4 ACLs. When we want to change the ACL on the mountpoint as a domain user, we get bizarre behavior. We are using a brick formatted as xfs which should support extended attributes but somehow it's not bein recognized. Maybe somebody in here could bring us on the right track. At the moment we don't know how to solve this.
This can be reproduced as follows with the configuration above :
1) Mount share as root on a client:
mount -vvv -t nfs4 -o sec=krb5,rw,acl,timeo=10 myhost.domain.com:/srv/shares /mnt/
2) Create a file as domain user
touch /mnt/test
nfs4_getfacl /mnt/test # getfacl returns nothing for a couple of seconds after that
nfs4_getfacl /mnt/test
A::OWNER@:rwatTcCy
A::GROUP@:rwatcy
A::EVERYONE@:tcy
3) After this we try to change the ACL
nfs4_setfacl -a "A:g:admins@mydomain.com:rwaDxtcy" "/mnt/test"
Failed setxattr operation: Invalid argument
This fails and we can see in the ganesha.log on the server
20/06/2018 11:49:37 : epoch 5b2a225d : myhost.mydomain.com : ganesha.nfsd-14820[work-38] glusterfs_set_acl :FSAL :MAJ :failed to set access type posix acl 20/06/2018 11:49:37 : epoch 5b2a225d : myhost.mydomain.com : ganesha.nfsd-14820[work-38] glusterfs_setattr2 :FSAL :CRIT :setattrs failed with error Success
and on the gluster server
[2018-06-20 09:49:37.673738] I [MSGID: 139001] [posix-acl.c:269:posix_acl_log_permit_denied] 0-myglustervolume-access-control: client: vm-0026.service.int.rabe.ch-14820-2018/06/20-09:46:05:292146-myglustervolume-client-1-0-0, gfid: d0d93931-8915-4a2f-b6be-f53ac154eacf, req(uid:1101,gid:1101,perm:2,ngrps:3), ctx(uid:1101,gid:1101,in-groups:1,perm:000,updated-fop:SETATTR, acl:-) [Permission denied]
[2018-06-20 09:49:37.674074] I [MSGID: 115060] [server-rpc-fops.c:899:_gf_server_log_setxattr_failure] 0-myglustervolume-server: 43: SETXATTR /test (d0d93931-8915-4a2f-b6be-f53ac154eacf) ==> glusterfs.posix.acl, client: myhost.mydomain.com-14820-2018/06/20-09:46:05:292146-myglustervolume-client-1-0-0, error-xlator: myglustervolume-access-control
[2018-06-20 09:49:37.674090] I [MSGID: 115060] [server-rpc-fops.c:929:server_setxattr_cbk] 0-myglustervolume-server: client: myhost.mydomain.com-14820-2018/06/20-09:46:05:292146-myglustervolume-client-1-0-0, error-xlator: myglustervolume-access-control [Permission denied]
Today I have noticed something that I can't understand. xfs should support extended attributes by default but on the gluster-server I can see this error:
[2018-12-05 16:30:35.492623] W [posix.c:4929:posix_getxattr] 0-myglustervolume-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)
[2018-12-05 16:30:35.492649] E [MSGID: 113001] [posix.c:4940:posix_getxattr] 0-myglustervolume-posix: getxattr failed on /srv/gluster/myglustervolume/gb-01/brick/.glusterfs/b8/26/b8264d88-43a9-4a5a-8381-aaa041cd8c9f: system.nfs4_acl [Operation not supported]
Anybody a hint, what the problem is?
Thanks
Simon
6 years