New subject: nsm_monitor/unmonitor using destroyed svc xprt without ref if statd restart.

Wednesday, 13 February 2019

Using Ganesha 2.7.1
I did this sequence with NFS V3.
1>nlm lock from client
2>Restart statd
3>nlm unlock from client

svc xprt got created for nsm_connect during nlm lock gets destroyed if
statd restart, this happens through svc_rqst_epoll_event.
But we have global nsm_clnt which will still point to destroyed svc xprt.
We have checks on svc xprt flags if it already got destroyed, but this will
not work if that memory get reallocated and we could end up doing memory
corruption.
Here are log snippets.
*xprt destroyed through epoll*:
13/02/2019 03:25:58 : epoch 5c63d058 : centos7 : ganesha.nfsd-33933[svc_13]
rpc :TIRPC :F_DBG :svc_vc_wait: 0x7f5d783f0400 fd 34 recv closed (will set
dead)
13/02/2019 03:25:58 : epoch 5c63d058 : centos7 : ganesha.nfsd-33933[svc_21]
rpc :TIRPC :F_DBG :svc_vc_destroy_task() 0x7f5d783f0400 fd 34 xp_refcnt 0

*nsm unmonitor accessing destroyed xprt:*
13/02/2019 03:26:51 : epoch 5c63d058 : centos7 : ganesha.nfsd-33933[svc_21]
rpc :TIRPC :F_DBG :WARNING! already destroying!() 0x7f5d783f0400 fd -1
xp_refcnt 0 af 2 port 58327 @svc_ioq_write:233
13/02/2019 03:26:54 : epoch 5c63d058 : centos7 : ganesha.nfsd-33933[svc_12]
nsm_unmonitor :NLM :CRIT :Unmonitor ::ffff:10.53.91.67 SM_MON failed: RPC:
Timed out

I am not sure if this is a right way to use nsm rpc client, as its pointing
to svc xprt without taking extra ref.
Is this a ref count issue with nsm rpc client, should we take extra ref for
it?
Or we are should not keep global nsm rpc client, instead do
nsm_connect/disconnect for every MON/UNMON call?

I tried with extra ref fix, it seems to be working,
https://paste.fedoraproject.org/paste/BCCeYi933UMdGht1gvbWow

Regards,
Gaurav

2025

2024

2023

2022

2021

2020

2019

2018

nsm_monitor/unmonitor using destroyed svc xprt without ref if statd restart.