NFS-Ganesha availability probleme / FSAL Gluster
by Dominique Rousseau
Hi,
I'm not sure my setup is "the right way", but we encounter problems on nfs-ganesha setup with glusterfs FSAL
glusterfs version is 3.13.2-1 ( on Debian 9 )
nfs-ganesha is 2.5.3-1
We have 2 gluster nodes ( called gluster1 an gluster2 ), both with nfs-ganesha installed. Some NFS clients connect to gluster1, others to gluster2
We are using ACL on gluster (with ext4 on the bricks) and they are activated in ganesha conf
We had the first problem 2 days ago, on gluster1, the second yesterday on gluster2
In the minutes befores the NFS stops responding, the only suspect log I get is :
27/05/2019 17:16:00 : epoch 5cebf468 : gluster1 : ganesha.nfsd-1877[work-123] posix_acl_2_fsal_acl :FSAL :WARN :Cannot retrieve permission set
27/05/2019 17:16:00 : epoch 5cebf468 : gluster1 : ganesha.nfsd-1877[work-123] posix_acl_2_fsal_acl :FSAL :WARN :Cannot retrieve permission set
( and so, thousands of lines )
the same, for second problem :
28/05/2019 18:18:16 : epoch 5cecdfa1 : gluster2 : ganesha.nfsd-11510[work-229] posix_acl_2_fsal_acl :FSAL :WARN :Cannot retrieve permission set
28/05/2019 18:18:16 : epoch 5cecdfa1 : gluster2 : ganesha.nfsd-11510[work-229] posix_acl_2_fsal_acl :FSAL :WARN :Cannot retrieve permission set
( and so, thousands of lines )
Restart of the nfs-ganesha process seems to be enough to restore access
The ganesha.conf file is like this :
NFS_Core_Param {
#Use supplied name other tha IP In NSM operations
NSM_Use_Caller_Name = true;
#Copy lock states into "/var/lib/nfs/ganesha" dir
Clustered = false;
#Use a non-privileged port for RQuota
Rquota_Port = 875;
}
EXPORT{
Export_Id = 1 ; # Export ID unique to each export
Path = "/alternc-html"; # Path of the volume to be exported. Eg: "/test_volume"
FSAL {
name = GLUSTER;
hostname = "127.0.0.1"; # IP of one of the nodes in the trusted pool
volume = "alternc-html"; # Volume name. Eg: "test_volume"
}
Access_type = RW; # Access permissions
Squash = No_root_squash; # To enable/disable root squashing
Disable_ACL = FALSE; # To enable/disable ACL
Pseudo = "/alternc-html"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
Protocols = "3"; ##,"4" ; # NFS protocols supported
Transports = "UDP","TCP" ; # Transport protocols supported
SecType = "sys"; # Security flavors supported
}
5 years, 6 months
Re: [ceph-users] Nfs-ganesha with rados_kv backend
by Jeff Layton
On Wed, 2019-05-29 at 13:49 +0000, Stolte, Felix wrote:
> Hi,
>
> is anyone running an active-passive nfs-ganesha cluster with cephfs backend and using the rados_kv recovery backend? My setup runs fine, but takeover is giving me a headache. On takeover I see the following messages in ganeshas log file:
>
Note that there are significant problems with the rados_kv recovery
backend. In particular, it does not properly handle the case where the
server crashes during the grace period. The rados_ng and rados_cluster
backends do handle those situations properly.
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server recovery event 5 nodeid -1 ip 10.0.0.5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] rados_kv_traverse :CLIENT ID :EVENT :Failed to lst kv ret=-2
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] rados_kv_read_recov_clids_takeover :CLIENT ID :EVENT :Failed to takeover
> 29/05/2019 15:38:26 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
>
> The result is clients hanging for up to 2 Minutes. Has anyone ran into the same problem?
>
> Ceph Version: 12.2.11
> nfs-ganesha: 2.7.3
>
If I had to guess, the hanging is probably due to state that is being
held by the other node's MDS session that hasn't expired yet. Ceph v12
doesn't have the client reclaim interfaces that make more instantaneous
failover possible. That's new in v14 (Nautilus). See pages 12 and 13
here:
https://static.sched.com/hosted_files/cephalocon2019/86/Rook-Deployed%20N...
> ganesha.conf (identical on both nodes besides nodeid in rados_kv:
>
> NFS_CORE_PARAM {
> Enable_RQUOTA = false;
> Protocols = 3,4;
> }
>
> CACHEINODE {
> Dir_Chunk = 0;
> NParts = 1;
> Cache_Size = 1;
> }
>
> NFS_krb5 {
> Active_krb5 = false;
> }
>
> NFSv4 {
> Only_Numeric_Owners = true;
> RecoveryBackend = rados_kv;
> Grace_Period = 5;
> Lease_Lifetime = 5;
Yikes! That's _way_ too short a grace period and lease lifetime. Ganesha
will probably exit the grace period before the clients ever realize the
server has restarted, and they will fail to reclaim their state.
> Minor_Versions = 1,2;
> }
>
> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "ganesha";
> pool = "cephfs_metadata";
> namespace = "ganesha";
> nodeid = "cephgw-k2-1";
> }
>
> Any hint would be appreciated.
I consider ganesha's dbus-based takeover mechanism to be broken by
design, as it requires the recovery backend to do things that can't be
done atomically. If a crash occurs at the wrong time, the recovery
database can end up trashed and no one can reclaim anything.
If you really want an active/passive setup then I'd move away from that
and just have whatever clustering software you're using start up the
daemon on the active node after ensuring that it's shut down on the
passive one. With that, you can also use the rados_ng recovery backend,
which is more resilient in the face of multiple crashes.
In that configuration you would want to have the same config file on
both nodes, including the same nodeid so that you can potentially take
advantage of the RECLAIM_RESET interface to kill off the old session
quickly after the server restarts.
You also need a much longer grace period.
Cheers,
--
Jeff Layton <jlayton(a)poochiereds.net>
5 years, 7 months
Re: [ceph-users] Nfs-ganesha with rados_kv backend
by Jeff Layton
On Wed, 2019-05-29 at 13:49 +0000, Stolte, Felix wrote:
> Hi,
>
> is anyone running an active-passive nfs-ganesha cluster with cephfs backend and using the rados_kv recovery backend? My setup runs fine, but takeover is giving me a headache. On takeover I see the following messages in ganeshas log file:
>
Note that there are significant problems with the rados_kv recovery
backend. In particular, it does not properly handle the case where the
server crashes during the grace period. The rados_ng and rados_cluster
backends do handle those situations properly.
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] nfs_start_grace :STATE :EVENT :NFS Server recovery event 5 nodeid -1 ip 10.0.0.5
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] rados_kv_traverse :CLIENT ID :EVENT :Failed to lst kv ret=-2
> 29/05/2019 15:38:21 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[dbus_heartbeat] rados_kv_read_recov_clids_takeover :CLIENT ID :EVENT :Failed to takeover
> 29/05/2019 15:38:26 : epoch 5cee88c4 : cephgw-e2-1 : ganesha.nfsd-9793[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
>
> The result is clients hanging for up to 2 Minutes. Has anyone ran into the same problem?
>
> Ceph Version: 12.2.11
> nfs-ganesha: 2.7.3
>
If I had to guess, the hanging is probably due to state that is being
held by the other node's MDS session that hasn't expired yet. Ceph v12
doesn't have the client reclaim interfaces that make more instantaneous
failover possible. That's new in v14 (Nautilus). See pages 12 and 13
here:
https://static.sched.com/hosted_files/cephalocon2019/86/Rook-Deployed%20N...
> ganesha.conf (identical on both nodes besides nodeid in rados_kv:
>
> NFS_CORE_PARAM {
> Enable_RQUOTA = false;
> Protocols = 3,4;
> }
>
> CACHEINODE {
> Dir_Chunk = 0;
> NParts = 1;
> Cache_Size = 1;
> }
>
> NFS_krb5 {
> Active_krb5 = false;
> }
>
> NFSv4 {
> Only_Numeric_Owners = true;
> RecoveryBackend = rados_kv;
> Grace_Period = 5;
> Lease_Lifetime = 5;
Yikes! That's _way_ too short a grace period and lease lifetime. Ganesha
will probably exit the grace period before the clients ever realize the
server has restarted, and they will fail to reclaim their state.
> Minor_Versions = 1,2;
> }
>
> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "ganesha";
> pool = "cephfs_metadata";
> namespace = "ganesha";
> nodeid = "cephgw-k2-1";
> }
>
> Any hint would be appreciated.
I consider ganesha's dbus-based takeover mechanism to be broken by
design, as it requires the recovery backend to do things that can't be
done atomically. If a crash occurs at the wrong time, the recovery
database can end up trashed and no one can reclaim anything.
If you really want an active/passive setup then I'd move away from that
and just have whatever clustering software you're using start up the
daemon on the active node after ensuring that it's shut down on the
passive one. With that, you can also use the rados_ng recovery backend,
which is more resilient in the face of multiple crashes.
In that configuration you would want to have the same config file on
both nodes, including the same nodeid so that you can potentially take
advantage of the RECLAIM_RESET interface to kill off the old session
quickly after the server restarts.
You also need a much longer grace period.
Cheers,
--
Jeff Layton <jlayton(a)redhat.com>
5 years, 7 months
Error messages
by David C
Hi All
I recently put an nfs-ganesha CEPH_FSAL deployment into production, so far
so good but I'm seeing some errors in the logs I didn't see when testing
and was hoping someone could shed some light on what they mean. I haven't
had any adverse behaviour reported from the clients (apart from a potential
issue with slow 'ls' operations which I'm investigating).
Versions:
libcephfs2-13.2.2-0.el7.x86_64
nfs-ganesha-2.7.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
Ceph cluster is 12.2.10
Log errors:
"posix2fsal_error :FSAL :INFO :Mapping 11 to ERR_FSAL_DELAY"
I'm seeing this one frequently although seems to spam the log with 20 or so
occurrences in a second.
"15/05/2019 18:27:01 : epoch 5cd99ef1 : nfsserver :
> ganesha.nfsd-1990[svc_1653] posix2fsal_error :FSAL :INFO :Mapping 5 to
> ERR_FSAL_IO, rlim_cur=1048576 rlim_max=1048576
> 15/05/2019 18:27:01 : epoch 5cd99ef1 : nfsserver :
> ganesha.nfsd-1990[svc_1653] nfs4_Errno_verbose :NFS4 :CRIT :Error I/O error
> in nfs4_mds_putfh converted to NFS4ERR_IO but was set non-retryable"
I've only seen a few occurrences of this one
17/05/2019 15:34:24 : epoch 5cdd9df8 : nfsserver :
> ganesha.nfsd-4696[svc_258] xdr_encode_nfs4_princ :ID MAPPER :INFO
> :nfs4_gid_to_name failed with code -2.
> 17/05/2019 15:34:24 : epoch 5cdd9df8 : nfsserver :
> ganesha.nfsd-4696[svc_258] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup
> for 1664 failed, using numeric group
This one doesn't seem too serious, my guess is there are accounts on the
clients with gids/uids that the server can't look up. The server is using
SSSD to bind to AD if that helps.
Export:
{
> Export_ID=100;
> Protocols = 4;
> Transports = TCP;
> Path = /;
> Pseudo = /ceph/;
> Access_Type = RW;
> Squash = No_root_squash;
> Attr_Expiration_Time = 0;
> Disable_ACL = FALSE;
> Manage_Gids = TRUE;
> Filesystem_Id = 100.1;
> FSAL {
> Name = CEPH;
> }
> }
Thanks,
David
5 years, 7 months
rpc_clnt_ping_timer_expired
by Valerio Luccio
Hello,
I tried doing a google search, but didn't come up with much useful info.
I have 4 CentOS 7 servers running glusterfs 5.3 that manage a 12 brick
Distributed-Replicate volume. I run nfs-ganesha 2.7.1.-1 on all 4.
I have another 3 Linux servers tha mount the gluster volume using
glusterfs (no problem there) and I distribute the mounts to use
different gluster servers.
I also have a legacy OSX server that mounts the volume via NFS from the
4th gluster server and does a Samba reshare (my users use the OSX to
mount the data disk).
The problem comes with this last mount. Every once in a while the
ganesha-gfapi.log file will show a number of rpc_clnt_ping_timer_expired
errors claiming that the servers have not responded in 42 seonds, and
what's most confusing to me is that it's not the OSX server, but rather
the other gluster servers including the same server on which it is
running. When this happens my OSX server freezes and I have to restart
ganesha-nfs.
I'm attaching the configuration files.
Thanks,
--
Valerio Luccio (212) 998-8736
Center for Brain Imaging 4 Washington Place, Room 157
New York University New York, NY 10003
"In an open world, who needs windows or gates ?"
5 years, 7 months