Hi,
I am using nfs4.1 mount with nconnect=16 from linux client (kernel
5.8.6-2.el7.elrepo.x86_64)
I am facing mount hung if client is idle for sometime after running
read/write IO.
From netstat I could see, one of client connection in CLOSE_WAIT on
server and FIN_WAIT2 on client.
On debugging further I found out that if there is back channel client
created on same client xprt then we won’t be able to close socket fd
on connection reset from client, since there will be ref from rpc
clnt, xprt will be released when reaper thread does client id expiry
and do clnt destroy.
I can see in logs client is keep on updating the lease and reaper is
always getting valid lease for client and not expiring it.
I am not sure if its due to multiple connections using nconnect due to
some bug in client code, I have not seen this issue without nconnect.
In nfs_rpc_create_chan_v41 we will create rpc_client on same xprt fd
to create back channel on same connection, this will take ref on xprt
for the rpc client.
We cannot release this ref until rpc client is destroyed so fd will be
open even if client has closed the connection.
Client is updating the lease and reaper is always getting valid lease.
Update lease from client:
25/03/2021 08:55:40 : epoch 605c1d45 : rbt-el7-2 :
ganesha.nfsd-109993[svc_1630] nfs4_op_sequence :CLIENT ID :F_DBG
:Don’t use sesson slot 1=0x7fa0c3c9ee38 for DRC
25/03/2021 08:55:40 : epoch 605c1d45 : rbt-el7-2 :
ganesha.nfsd-109993[svc_1630] update_lease :CLIENT ID :F_DBG :Update
Lease 0x7fa0bf813d00 ClientID={Epoch=0x605c1d45 Counter=0x0000002a}
CONFIRMED Client={0x7fa0bf834630 name=(23:Linux NFSv4.1 rbt-el7-1)
refcount=1} t_delta=0 reservations=0 refcount=2
Reaper check:
25/03/2021 08:56:22 : epoch 605c1d45 : rbt-el7-2 :
ganesha.nfsd-109993[reaper] reaper_run :CLIENT ID :DEBUG :Now checking
NFS4 clients for expiration
25/03/2021 08:56:22 : epoch 605c1d45 : rbt-el7-2 :
ganesha.nfsd-109993[reaper] valid_lease :CLIENT ID :F_DBG :Check Lease
0x7fa0bf813d00 ClientID={Epoch=0x605c1d45 Counter=0x0000002a}
CONFIRMED Client={0x7fa0bf834630 name=(23:Linux NFSv4.1 rbt-el7-1)
refcount=1} t_delta=1 reservations=0 refcount=2 (Valid=YES 59 seconds
left)
Can we destroy the rpc client while destroying svc xprt? these back
channel clients won't required since connection is closed by client.
Regards,
Gaurav