Hello,
I observe a NLM/NFS v3 locking issue with nfs ganesha: While a lock is no longer visible
on the client in /proc/locks or with ‘lslocks’ the lock still is present on the ganesha
server in /proc/locks. Further lock requests on the client fail or only succeed after some
10 seconds and a second try. I don’t see the same on a second server, hence I assume that
a reboot will fix the issue for the moment – but this won’t help in the long run. I will
open a support call with IBM as this is part of CES/SpectrumScale. Still if you note
something that resembles a known bug or issue please let me know.
In the logs I see the following -
2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp
connection to client ::ffff:129.129.99.62
2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call
failed with return code -1. Removing the blocking lock
2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp
connection to client ::ffff:129.129.99.62
2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call
failed with return code -1. Removing the blocking lock
2019-02-03 06:51:31 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure
call 10 failed with return code 4 : RPC: Success
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure
call 10 failed with return code 4 : RPC: Success
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp
connection to client ::ffff:129.129.99.48
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call
failed with return code -1. Removing the blocking lock
2019-02-03 09:23:42 : epoch 00020089 : xbl-ces-2.psi.ch :
gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp
connection to client ::ffff:129.129.99.48
and about 20 sockets in state CLOSE_WAIT, all used by ganesha.
The message about “Cannot create NLM async tcp connection” is unique on the server which
shows the issue. Several clients are affected, hence I don’t suspect an issue on the
client side. It happens also with different files.
I don’t see anything limited on the ganesha server – ~300 tcp sockets in total, plenty of
free memory, about 300 ganesha threads.
Thank you,
Heiner Billich
--
Paul Scherrer Institut
Heiner Billich
System Engineer Scientific Computing
Science IT / High Performance Computing
WHGA/106
Forschungsstrasse 111
5232 Villigen PSI
Switzerland
Phone +41 56 310 36 02
heiner.billich@psi.ch<mailto:heiner.billich@psi.ch>
https://www.psi.ch