This sounds like it might be a session refcount bug. The session
cleanup is what's
supposed to destroy the back channel, and since this is the first time this has
come up, it seems likely that it's mostly working.
I'd like Frank's opinion on this, since he understands the session code better
than
I do. He's on vacation this week, so he'll weigh in when he gets back.
Yea, that could be a session refcount bug. But it could also just be something isn't
right with the backchannel code. I really haven't dug much into how the backchannel
really works.
I think turning on STATE debug will show you the session refcounting and may also show
some of what is going on with backchannel.
Frank
Daniel
On 3/25/21 9:06 AM, gaurav gangalwar wrote:
> Hi,
> I am using nfs4.1 mount with nconnect=16 from linux client (kernel
> 5.8.6-2.el7.elrepo.x86_64)
> I am facing mount hung if client is idle for sometime after running
> read/write IO.
> From netstat I could see, one of client connection in CLOSE_WAIT on
> server and FIN_WAIT2 on client.
>
> On debugging further I found out that if there is back channel client
> created on same client xprt then we won’t be able to close socket fd
> on connection reset from client, since there will be ref from rpc
> clnt, xprt will be released when reaper thread does client id expiry
> and do clnt destroy.
> I can see in logs client is keep on updating the lease and reaper is
> always getting valid lease for client and not expiring it.
>
> I am not sure if its due to multiple connections using nconnect due to
> some bug in client code, I have not seen this issue without nconnect.
>
> In nfs_rpc_create_chan_v41 we will create rpc_client on same xprt fd
> to create back channel on same connection, this will take ref on xprt
> for the rpc client.
> We cannot release this ref until rpc client is destroyed so fd will be
> open even if client has closed the connection.
> Client is updating the lease and reaper is always getting valid lease.
>
> Update lease from client:
> 25/03/2021 08:55:40 : epoch 605c1d45 : rbt-el7-2 :
> ganesha.nfsd-109993[svc_1630] nfs4_op_sequence :CLIENT ID :F_DBG
> :Don’t use sesson slot 1=0x7fa0c3c9ee38 for DRC
> 25/03/2021 08:55:40 : epoch 605c1d45 : rbt-el7-2 :
> ganesha.nfsd-109993[svc_1630] update_lease :CLIENT ID :F_DBG :Update
> Lease 0x7fa0bf813d00 ClientID={Epoch=0x605c1d45 Counter=0x0000002a}
> CONFIRMED Client={0x7fa0bf834630 name=(23:Linux NFSv4.1 rbt-el7-1)
> refcount=1} t_delta=0 reservations=0 refcount=2
>
> Reaper check:
> 25/03/2021 08:56:22 : epoch 605c1d45 : rbt-el7-2 :
> ganesha.nfsd-109993[reaper] reaper_run :CLIENT ID :DEBUG :Now checking
> NFS4 clients for expiration
> 25/03/2021 08:56:22 : epoch 605c1d45 : rbt-el7-2 :
> ganesha.nfsd-109993[reaper] valid_lease :CLIENT ID :F_DBG :Check Lease
> 0x7fa0bf813d00 ClientID={Epoch=0x605c1d45 Counter=0x0000002a}
> CONFIRMED Client={0x7fa0bf834630 name=(23:Linux NFSv4.1 rbt-el7-1)
> refcount=1} t_delta=1 reservations=0 refcount=2 (Valid=YES 59 seconds
> left)
>
> Can we destroy the rpc client while destroying svc xprt? these back
> channel clients won't required since connection is closed by client.
>
> Regards,
> Gaurav
>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org