Hello Michael,
Thank you, I will also look at the tcp queue status when it will happen again. After 2
episodes now is not happening anymore since a couple of days.
In the meantime I would like to trigger the issue from a dedicated client, to see if the
symptoms and the log messages are the same.
Does anyone have suggestions (code would even be better :) on how to test this?
Thanks,
Ivano
On 09/10/18 15:09, "Michael Diederich" <diederich(a)de.ibm.com> wrote:
Ivano,
I am sure we are working on your ticket :-)
have a look at the sum of your tcp send-q bytes (netstat output) and compare that to
tcp_wmem setting (sysctl )
it is possible you may have a client that is not acking the data it requested...
Michael
From: "Talamo Ivano Giuseppe (PSI)" <Ivano.Talamo(a)psi.ch>
To: "support(a)lists.nfs-ganesha.org"
<support(a)lists.nfs-ganesha.org>
Date: 10/08/2018 03:04 PM
Subject: [NFS-Ganesha-Support] unavailability of NFSv3
________________________________________
Hi all,
We are using nfs-ganesha via the IBM Spectrum Scale protocol setup, currently
consisting of 2 servers and around 50 clients.
After a couple of months of smooth run, we started to experience (already twice in
three days) a critical issue that consists in all clients
not being able to mount the filesystem.
When the issue happens this is what we see via rpcinfo on the server:
[root@server ~]# rpcinfo -s
program version(s) netid(s) service owner
100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser
100024 1 tcp6,udp6,tcp,udp status 29
100003 4,3 tcp6,tcp,udp6,udp nfs superuser
100005 3,1 tcp6,tcp,udp6,udp mountd superuser
100021 4 tcp6,tcp,udp6,udp nlockmgr superuser
100011 2,1 tcp6,tcp,udp6,udp rquotad superuser
[root@host ~]# rpcinfo -T tcp localhost 100003 3
rpcinfo: RPC: Timed out
The logs are set to EVENT and when the issue starts, ganesha.log gets full of lines
like the following:
2018-10-04 16:40:45 : epoch 0002003d : server : gpfs.ganesha.nfsd-40452[State_Async]
nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client
::ffff:129.129.117.65
2018-10-04 16:40:45 : epoch 0002003d : server: gpfs.ganesha.nfsd-40452[State_Async]
nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing
the blocking lock
The nfs-ganesha version is 2.5.3 even if that’s the ibm version so I am not sure what
are the changes.
I was wondering if someone on the mailing list had an idea of what direction to take
to investigate this further.
Thanks,
Ivano
_______________________________________________
Support mailing list -- support(a)lists.nfs-ganesha.org
To unsubscribe send an email to support-leave(a)lists.nfs-ganesha.org