Hello Ivano,
please open a case with IBM support.
We will need to look at more data here - a gpfs / ces / nfs snap, exact
versions.
I would be curious to see your authentication config. We have seen a
similar issue with a big customer - unrelated to Ganesha !
Michael
Mit freundlichen Grüßen / with best regards
Michael Diederich
IBM Systems Group
Spectrum Scale
Software Development
Contact Information
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
mail:
fon:
address:
michael.diederich(a)de.ibm.com
+49-7034-274-4062
Am Weiher 24
D-65451 Kelsterbach
From: "Talamo Ivano Giuseppe (PSI)" <Ivano.Talamo(a)psi.ch>
To: Michael Diederich <diederich(a)de.ibm.com>
Cc: "support(a)lists.nfs-ganesha.org" <support(a)lists.nfs-ganesha.org>
Date: 10/17/2018 10:37 AM
Subject: Re: [NFS-Ganesha-Support] unavailability of NFSv3
Dear Michael,
We hit the issue again and I was able to collect some data.
First of all I observed an extreme high number of CLOSE_WAIT from ganesha
to the client, with always 29 bytes in the recv-q buffer.
There’s about 11k lines like the following in netstat:
tcp6 29 0 server:38534 client:38610 CLOSE_WAIT
The client is always the same one.
And 38610 is the port on the client were NLM is running:
[root@server ~]# rpcinfo -p client
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 44714 status
100024 1 tcp 56631 status
100021 1 udp 35435 nlockmgr
100021 3 udp 35435 nlockmgr
100021 4 udp 35435 nlockmgr
100021 1 tcp 38610 nlockmgr
100021 3 tcp 38610 nlockmgr
100021 4 tcp 38610 nlockmgr
The nlm seem to reply fine:
[root@server ~]# rpcinfo -T tcp client 100021 3
program 100021 version 3 ready and waiting
I also collected the content of /proc/<PID-of-Ganesha-process>/task/*.
There’s no trace of writev and it looks fine to me, most of the tasks
(274/283) are like the following:
[<ffffffff810f7016>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f7cdb>] futex_wait+0x17b/0x280
[<ffffffff810f9a16>] do_futex+0x106/0x5a0
[<ffffffff810f9f30>] SyS_futex+0x80/0x180
[<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
there’s a couple like:
[<ffffffffc0a18cd1>] cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux]
[<ffffffffc0b92a10>]
_ZN6ThCond12internalWaitEP16KernelSynchStatejPv+0xd0/0x260 [mmfs26]
[<ffffffffc0b93d0e>] _ZN6ThCond18kInterruptibleWaitEiPKc+0x1de/0x3d0
[mmfs26]
[<ffffffffc0b42e94>]
_Z17gpfsGaneshaUpdateP13gpfsVfsData_tPiS1_P10cxiVattr_tP5glockS1_PjS6_S6_iS6_+0x264/0x7d0
[mmfs26]
[<ffffffffc0a20ba2>] gpfs_wait_inode_update+0x1a2/0x740 [mmfslinux]
[<ffffffffc0a211aa>] get_inode_update+0x6a/0x90 [mmfslinux]
[<ffffffffc0a28eb2>] kxGanesha+0x5a2/0x3990 [mmfslinux]
[<ffffffffc0c01f17>] _Z8ss_ioctljm+0x677/0x1c00 [mmfs26]
[<ffffffffc0a097e1>] ss_fs_unlocked_ioctl+0xf1/0x530 [mmfslinux]
[<ffffffff8121730d>] do_vfs_ioctl+0x33d/0x540
[<ffffffff812175b1>] SyS_ioctl+0xa1/0xc0
[<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
a couple like:
[<ffffffff81217ff5>] poll_schedule_timeout+0x55/0xb0
[<ffffffff8121957d>] do_sys_poll+0x4cd/0x580
[<ffffffff81219734>] SyS_poll+0x74/0x110
[<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
and 5 like:
[<ffffffff8124dc9e>] ep_poll+0x23e/0x360
[<ffffffff8124f12d>] SyS_epoll_wait+0xed/0x120
[<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
When the issue started I also run ganesha_mgr set_log COMPONENT_ALL
FULL_DEBUG for a few minutes
and collected about half gigabyte of data, in case it may be used for
further investigation.
Thanks,
Ivano
On 09/10/18 15:09, "Michael Diederich" <diederich(a)de.ibm.com> wrote:
Ivano,
I am sure we are working on your ticket :-)
have a look at the sum of your tcp send-q bytes (netstat output) and
compare that to tcp_wmem setting (sysctl )
it is possible you may have a client that is not acking the data it
requested...
Michael
From: "Talamo Ivano Giuseppe (PSI)" <Ivano.Talamo(a)psi.ch>
To: "support(a)lists.nfs-ganesha.org"
<support(a)lists.nfs-ganesha.org>
Date: 10/08/2018 03:04 PM
Subject: [NFS-Ganesha-Support] unavailability of NFSv3
________________________________________
Hi all,
We are using nfs-ganesha via the IBM Spectrum Scale protocol setup,
currently consisting of 2 servers and around 50 clients.
After a couple of months of smooth run, we started to experience
(already twice in three days) a critical issue that consists in all
clients
not being able to mount the filesystem.
When the issue happens this is what we see via rpcinfo on the server:
[root@server ~]# rpcinfo -s
program version(s) netid(s) service owner
100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper
superuser
100024 1 tcp6,udp6,tcp,udp status 29
100003 4,3 tcp6,tcp,udp6,udp nfs superuser
100005 3,1 tcp6,tcp,udp6,udp mountd superuser
100021 4 tcp6,tcp,udp6,udp nlockmgr
superuser
100011 2,1 tcp6,tcp,udp6,udp rquotad
superuser
[root@host ~]# rpcinfo -T tcp localhost 100003 3
rpcinfo: RPC: Timed out
The logs are set to EVENT and when the issue starts, ganesha.log gets
full of lines like the following:
2018-10-04 16:40:45 : epoch 0002003d : server :
gpfs.ganesha.nfsd-40452[State_Async] nlm_send_async :NLM :MAJ :Cannot
create NLM async tcp connection to client ::ffff:129.129.117.65
2018-10-04 16:40:45 : epoch 0002003d : server:
gpfs.ganesha.nfsd-40452[State_Async] nlm4_send_grant_msg :NLM :MAJ
:GRANTED_MSG RPC call failed with return code -1. Removing the blocking
lock
The nfs-ganesha version is 2.5.3 even if that’s the ibm version so I
am not sure what are the changes.
I was wondering if someone on the mailing list had an idea of what
direction to take to investigate this further.
Thanks,
Ivano
_______________________________________________
Support mailing list -- support(a)lists.nfs-ganesha.org
To unsubscribe send an email to support-leave(a)lists.nfs-ganesha.org