nfs-ganesha-2.8.2 on CentOS-7.7 and error: /lib64/libganesha_nfsd.so.2.8: undefined symbol: rados_read_op_omap_get_vals2
by Todd Pfaff
I'm trying to run nfs-ganesha-2.8.2 on CentOS-7.7 and it's failing with
the error shown in the Subject line. More details below.
On some other CentOS-7 systems I'm successfully running nfs-ganesha-2.7.6
from the centos-gluster41 repo, but I thought I'd have a go with the
nfs-ganesha-2.8.2 from the centos-nfs-ganesha28 repo. Was that a mistake?
Is there a good document anyone could point me to that describes current
recommendations (e.g. what versions to use, what to avoid) for installing
and using nfs-ganesha on CentOS-7.
I should point out that I'm only interested in using the PROXY FSAL at
this time. This is what I'm happily - for the most part - using with
nfs-ganesha-2.7.6.
My google-fu has not turned up an answer, so I'm turning here for help.
Any help in solving this problem with 2.8.2 or advice about using other
versions will be much appreciated. Thanks!
[root@localhost ~]# uname -r
3.10.0-1062.1.1.el7.x86_64
[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root@localhost ~]# yum install -y nfs-ganesha nfs-ganesha-proxy
[root@localhost ~]# !syst
systemctl status -l nfs-ganesha
● nfs-ganesha.service - NFS-Ganesha file server
Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled;
vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2019-09-27 12:04:14 EDT;
2h 48min ago
Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
Process: 24252 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS}
/usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=127)
Sep 27 12:04:14 localhost systemd[1]: Starting NFS-Ganesha file server...
Sep 27 12:04:14 localhost bash[24252]: /usr/bin/ganesha.nfsd: symbol lookup
error: /lib64/libganesha_nfsd.so.2.8: undefined symbol:
rados_read_op_omap_get_vals2
Sep 27 12:04:14 localhost systemd[1]: nfs-ganesha.service: control process
exited, code=exited status=127
Sep 27 12:04:14 localhost systemd[1]: Failed to start NFS-Ganesha file
server.
Sep 27 12:04:14 localhost systemd[1]: Unit nfs-ganesha.service entered failed
state.
Sep 27 12:04:14 localhost systemd[1]: nfs-ganesha.service failed.
5 years, 3 months
NFS Ganesha as a Caching NFS Proxy
by indivar.nair@techterra.in
Hi All,
Can we use FSCache or any other caching mechanism to first locally cache an NFS Filesystem, before re-exporting it using NFS-Ganesha?
NFS Server ---> ((NFS Client SSD Cache + NFS Ganesha)) ---> Actual NFS ClientS
Regards,
Indivar Nair
5 years, 3 months
Ganesha daemon has 400'000 open files - is this unusual?
by Billich Heinrich Rainer (ID SD)
Hello,
Is it usual to see 200’000-400’000 open files for a single ganesha process? Or does this indicate that something ist wrong?
We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200’000-400’000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1’000 – 10’000 open files by ganesha only and don’t show the issue.
If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn’t open/close a file, it’s the server to decide when to close it? We do have a few NFSv4 clients, too.
Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?
Thank you,
Heiner
I did count the open files by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ‘ls -ls’ to list all the symbolic links, hence I can’t relate the open files to different exports easily.
--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich(a)id.ethz.ch
========================
5 years, 3 months
Re: big issue with Ubuntu 14/16 clients
by Peter Grandi
> [...] No entries are omitted when the client is Fedora 30 with kernel
> 5.2.nn or with GRML 2018.12 or Parrot 4.7 (kernel 4.19.nn in both
> cases). [...]
This is a directory with 90 entries under ULTS 18 with the native 4.15.0
kernel and the backported 5.0 kernel, with various 'rsize' values:
$ uname -a; for N in 1024 2048 4096 6144 8192 12288 16384 32768 65536;
do echo -n "$N => "; sudo mount -t nfs -o
rw,vers=4,proto=tcp,timeo=10,intr,rsize=$N azara:/scratch /mnt/tmp && ls
/mnt/tmp/test | wc -l; sudo umount /mnt/tmp; done
Linux noether 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux
1024 => 90
2048 => 81
4096 => 86
6144 => 86
8192 => 88
12288 => 88
16384 => 90
32768 => 90
65536 => 90
$ uname -a; for N in 1024 2048 4096 6144 8192 12288 16384 32768 65536;
do echo -n "$N => "; sudo mount -t nfs -o
rw,vers=4,proto=tcp,timeo=10,intr,rsize=$N azara:/scratch /mnt/tmp && ls
/mnt/tmp/test | wc -l; sudo umount /mnt/tmp; done
Linux noether 5.0.0-27-generic #28~18.04.1-Ubuntu SMP Thu Aug 22
03:00:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
1024 => 90
2048 => 90
4096 => 90
6144 => 90
8192 => 90
12288 => 90
16384 => 90
32768 => 90
65536 => 90
So my suspicion that the bug in the NFSv4 kernel client was fixed
between 4.15 and 5.0 seems reliable.
On another client with 4.15 (server also 4.15) 'rsize' 8192 and 12288
seemed to "hang", and on this client with 4.15 the listing was much
slower.
5 years, 3 months
big issue with Ubuntu 14/16 clients
by Peter Grandi
Hi, I have started using Ganesha a bit more intensely than usual
under Ubuntu LTS 14 (kernel 4.4.0-nnn) and 16 (kernel 4.15-nn)
and I was astonished to see that 'rm -r' of a directory
hierarchy returned that some files were not deleted, and 1-2
repeats were needed (mounted with NFSv4, TCP, 'sec=sys').
Investigating it I saw that 'ls' of a directory often (usually)
did not return all entries in a directory. Indeed using 'strace'
both 'rm' and 'ls' showed missing entries from 'getdents'(2).
Looking at packet captures shows that this happens to READDIR
responses that are split in several parts, and the entry omitted
is always the last one in each part. But sometimes it is not
omitted by the client. This omission never happens with the
'nfs-kernel-server'. The omission is highly repeatable for a
given directory and client, but the omitted entries depend on
the client and the number of entries in a directory. On one
client for example it was consistently every 40th entry (because
each part contained 40 entries).
No entries are omitted when the client is Fedora 30 with kernel
5.2.nn or with GRML 2018.12 or Parrot 4.7 (kernel 4.19.nn in
both cases).
My investigations, with the good assistance of a couple of
people from the #Ganesha IRC channel, seem to show that this is
a bug in the Ubuntu kernel 'nfs' kernel filesystem, but since
'nfs-kernel-server' does not trigger it, it gives the impression
that it is Ganesha that is unreliable, so a workaround within
Ganesha might be useful.
I have made a TCP trace here of a 'ls' of a directory with 90
entries:
http://www.sabi.co.uk/xtmp/nfs-ganesha-ext4-miss_32_51_70.pcap
On one client the missing entries are "0032" (packet 82), "0051"
(packet 87) and "0070" (packet 92), all of which are the last
entry in their part of the response; curiously entries "0013"
and "0088" the last entry their parts of the response but are
listed by the 'nfs' client.
I am not familiar enough with the NFS v4 protocol to reliably
spot any differences that might trigger that bug, but I had a
look and I cannot see any.
BTW I had a look at potentially relevant commits from the kernel
'git' and no obvious point, but these commits may be relevant:
02ef04e432babf8fc703104212314e54112ecd2d
98de9ce6f6660d02aa72d7b9b17696fa68a2ed9b
5 years, 3 months