Hi All
I recently put an nfs-ganesha CEPH_FSAL deployment into production, so far
so good but I'm seeing some errors in the logs I didn't see when testing
and was hoping someone could shed some light on what they mean. I haven't
had any adverse behaviour reported from the clients (apart from a potential
issue with slow 'ls' operations which I'm investigating).
Versions:
libcephfs2-13.2.2-0.el7.x86_64
nfs-ganesha-2.7.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
Ceph cluster is 12.2.10
Log errors:
"posix2fsal_error :FSAL :INFO :Mapping 11 to ERR_FSAL_DELAY"
I'm seeing this one frequently although seems to spam the log with 20 or so
occurrences in a second.
"15/05/2019 18:27:01 : epoch 5cd99ef1 : nfsserver :
ganesha.nfsd-1990[svc_1653] posix2fsal_error :FSAL :INFO :Mapping 5
to
ERR_FSAL_IO, rlim_cur=1048576 rlim_max=1048576
15/05/2019 18:27:01 : epoch 5cd99ef1 : nfsserver :
ganesha.nfsd-1990[svc_1653] nfs4_Errno_verbose :NFS4 :CRIT :Error I/O error
in nfs4_mds_putfh converted to NFS4ERR_IO but was set non-retryable"
I've only seen a few occurrences of this one
17/05/2019 15:34:24 : epoch 5cdd9df8 : nfsserver :
ganesha.nfsd-4696[svc_258] xdr_encode_nfs4_princ :ID MAPPER :INFO
:nfs4_gid_to_name failed with code -2.
17/05/2019 15:34:24 : epoch 5cdd9df8 : nfsserver :
ganesha.nfsd-4696[svc_258] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup
for 1664 failed, using numeric group
This one doesn't seem too serious, my guess is there are accounts on the
clients with gids/uids that the server can't look up. The server is using
SSSD to bind to AD if that helps.
Export:
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /ceph/;
Access_Type = RW;
Squash = No_root_squash;
Attr_Expiration_Time = 0;
Disable_ACL = FALSE;
Manage_Gids = TRUE;
Filesystem_Id = 100.1;
FSAL {
Name = CEPH;
}
}
Thanks,
David