Its happening with latest next branch also with extra ref fixes in tirpc.
I had to disable PORTMAP in tirpc code to make statd communication work in
next.
Regards,
Gaurav
On Mon, Feb 18, 2019 at 12:18 PM gaurav gangalwar <
gaurav.gangalwar(a)gmail.com> wrote:
Getting crash occasionally, if run IO in parallel to the scenario
mentioned above.
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007fc1fd6427cb in opr_rbtree_insert (head=0x7fc1615edf08,
node=0x7fc1f2010c30)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/rbtree.c:271
#2 0x00007fc1fd63cde4 in clnt_req_setup (cc=0x7fc1f2010c00, timeout=...)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/clnt_generic.c:538
#3 0x00000000004a2bce in nsm_unmonitor (host=0x7fc1615e3080) at
/usr/src/debug/nfs-ganesha-2.7.1/Protocols/NLM/nsm.c:219
#4 0x00000000004e6a7f in dec_nsm_client_ref (client=0x7fc1615e3080) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/nlm_owner.c:857
#5 0x00000000004e73dd in free_nlm_client (client=0x7fc1615dfa40) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/nlm_owner.c:1039
#6 0x00000000004e7753 in dec_nlm_client_ref (client=0x7fc1615dfa40) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/nlm_owner.c:1130
#7 0x00000000004e7f36 in free_nlm_owner (owner=0x7fc161616200) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/nlm_owner.c:1314
#8 0x00000000004c87bb in free_state_owner (owner=0x7fc161616200) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/state_misc.c:818
#9 0x00000000004c8d56 in dec_state_owner_ref (owner=0x7fc161616200) at
/usr/src/debug/nfs-ganesha-2.7.1/SAL/state_misc.c:968
#10 0x000000000049d906 in nlm4_Unlock (args=0x7fc1f2022f08,
req=0x7fc1f2022800, res=0x7fc160e522c0)
at /usr/src/debug/nfs-ganesha-2.7.1/Protocols/NLM/nlm_Unlock.c:119
#11 0x000000000045cacb in nfs_rpc_process_request (reqdata=0x7fc1f2022800)
at /usr/src/debug/nfs-ganesha-2.7.1/MainNFSD/nfs_worker_thread.c:1329
#12 0x000000000045d399 in nfs_rpc_valid_NLM (req=0x7fc1f2022800)
at /usr/src/debug/nfs-ganesha-2.7.1/MainNFSD/nfs_worker_thread.c:1581
#13 0x00007fc1fd658d9c in svc_vc_decode (req=0x7fc1f2022800) at
/usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_vc.c:825
#14 0x000000000044fc82 in nfs_rpc_decode_request (xprt=0x7fc1f8453800,
xdrs=0x7fc1f8444c00)
at
/usr/src/debug/nfs-ganesha-2.7.1/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
#15 0x00007fc1fd658cad in svc_vc_recv (xprt=0x7fc1f8453800) at
/usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_vc.c:798
#16 0x00007fc1fd6553fe in svc_rqst_xprt_task (wpe=0x7fc1f8453a18)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_rqst.c:767
#17 0x00007fc1fd655878 in svc_rqst_epoll_events (sr_rec=0x7fc1f84c3b10,
n_events=2)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_rqst.c:939
#18 0x00007fc1fd655b0d in svc_rqst_epoll_loop (sr_rec=0x7fc1f84c3b10)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_rqst.c:1012
#19 0x00007fc1fd655bc0 in svc_rqst_run_task (wpe=0x7fc1f84c3b10)
at /usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/svc_rqst.c:1048
#20 0x00007fc1fd65e510 in work_pool_thread (arg=0x7fc1f240e020) at
/usr/src/debug/nfs-ganesha-2.7.1/libntirpc/src/work_pool.c:181
#21 0x00007fc1fbbd1dd5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007fc1fb4d8ead in clone () from /lib64/libc.so.6
(gdb)
Regards,
Gaurav
On Wed, Feb 13, 2019 at 9:06 PM gaurav gangalwar <
gaurav.gangalwar(a)gmail.com> wrote:
> Using Ganesha 2.7.1
> I did this sequence with NFS V3.
> 1>nlm lock from client
> 2>Restart statd
> 3>nlm unlock from client
>
> svc xprt got created for nsm_connect during nlm lock gets destroyed if
> statd restart, this happens through svc_rqst_epoll_event.
> But we have global nsm_clnt which will still point to destroyed svc xprt.
> We have checks on svc xprt flags if it already got destroyed, but this
> will not work if that memory get reallocated and we could end up doing
> memory corruption.
> Here are log snippets.
> *xprt destroyed through epoll*:
> 13/02/2019 03:25:58 : epoch 5c63d058 : centos7 :
> ganesha.nfsd-33933[svc_13] rpc :TIRPC :F_DBG :svc_vc_wait: 0x7f5d783f0400
> fd 34 recv closed (will set dead)
> 13/02/2019 03:25:58 : epoch 5c63d058 : centos7 :
> ganesha.nfsd-33933[svc_21] rpc :TIRPC :F_DBG :svc_vc_destroy_task()
> 0x7f5d783f0400 fd 34 xp_refcnt 0
>
> *nsm unmonitor accessing destroyed xprt:*
> 13/02/2019 03:26:51 : epoch 5c63d058 : centos7 :
> ganesha.nfsd-33933[svc_21] rpc :TIRPC :F_DBG :WARNING! already
> destroying!() 0x7f5d783f0400 fd -1 xp_refcnt 0 af 2 port 58327
> @svc_ioq_write:233
> 13/02/2019 03:26:54 : epoch 5c63d058 : centos7 :
> ganesha.nfsd-33933[svc_12] nsm_unmonitor :NLM :CRIT :Unmonitor
> ::ffff:10.53.91.67 SM_MON failed: RPC: Timed out
>
>
> I am not sure if this is a right way to use nsm rpc client, as its
> pointing to svc xprt without taking extra ref.
> Is this a ref count issue with nsm rpc client, should we take extra ref
> for it?
> Or we are should not keep global nsm rpc client, instead do
> nsm_connect/disconnect for every MON/UNMON call?
>
> I tried with extra ref fix, it seems to be working,
>
https://paste.fedoraproject.org/paste/BCCeYi933UMdGht1gvbWow
>
> Regards,
> Gaurav
>