Hello,

 

I observe a NLM/NFS v3 locking issue with nfs ganesha: While a lock is no longer visible on the client in /proc/locks or with ‘lslocks’  the lock still is present on the ganesha server in /proc/locks. Further lock requests on the client fail or only succeed after some 10 seconds and a second try. I don’t see the same on a second server, hence I assume that a reboot will fix the issue for the moment – but this won’t help in the long run. I will open a support call with IBM as this is part of CES/SpectrumScale. Still if you note something that resembles a known bug or issue please let me know.

 

In the logs I see the following -

 

2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.62

2019-02-04 19:41:57 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock

2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.62

2019-02-04 19:42:17 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock

 

 

 

2019-02-03 06:51:31 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure call 10 failed with return code 4 : RPC: Success

2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :CRIT :NLM async Client procedure call 10 failed with return code 4 : RPC: Success

2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.48

2019-02-03 09:23:07 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm4_send_grant_msg :NLM :MAJ :GRANTED_MSG RPC call failed with return code -1. Removing the blocking lock

2019-02-03 09:23:42 : epoch 00020089 : xbl-ces-2.psi.ch : gpfs.ganesha.nfsd-15508[State_Async] nlm_send_async :NLM :MAJ :Cannot create NLM async tcp connection to client ::ffff:129.129.99.48

 

and about 20 sockets in state CLOSE_WAIT, all used by ganesha.

 

The message about “Cannot create NLM async tcp connection” is unique on the server which shows the issue. Several clients are affected, hence I don’t suspect an issue on the client side. It happens also with different files.

 

I don’t see anything limited on the ganesha server – ~300 tcp sockets in total, plenty of free memory, about 300 ganesha threads.

 

Thank you,

 

Heiner Billich

--

Paul Scherrer Institut

Heiner Billich                           

System Engineer Scientific Computing

Science IT / High Performance Computing                

WHGA/106                             

Forschungsstrasse 111

5232 Villigen PSI

Switzerland

 

Phone +41 56 310 36 02

heiner.billich@psi.ch

https://www.psi.ch