Hello,
I observe a NLM/NFS v3 locking issue with nfs ganesha: While a lock is no longer visible on the client in /proc/locks or with ‘lslocks’ the lock still is present on the ganesha server in /proc/locks. Further
lock requests on the client fail or only succeed after some 10 seconds and a second try. I don’t see the same on a second server, hence I assume that a reboot will fix the issue for the moment – but this won’t help in the long run. I will open a support call
with IBM as this is part of CES/SpectrumScale. Still if you note something that resembles a known bug or issue please let me know.
In the logs I see the following -
2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.62
2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock
2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.62
2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock
2019-02-03 06:51:31 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure call 10 failed with return code 4 : RPC: Success
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure call 10 failed with return code 4 : RPC: Success
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.48
2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock
2019-02-03 09:23:42 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.48
and about 20 sockets in state CLOSE_WAIT, all used by ganesha.
The message about “Cannot create NLM async tcp connection” is unique on the server which shows the issue. Several clients are affected, hence I don’t suspect an issue on the client side. It happens also with
different files.
I don’t see anything limited on the ganesha server – ~300 tcp sockets in total, plenty of free memory, about 300 ganesha threads.
Thank you,
Heiner Billich
--
Paul Scherrer Institut
Heiner Billich
System Engineer Scientific Computing
Science IT / High Performance Computing
WHGA/106
Forschungsstrasse 111
5232 Villigen PSI
Switzerland
Phone +41 56 310 36 02
https://www.psi.ch