Malahal,
Thanks for the reply. Yes, I reverted those changes and tried yesterday and we still see
ganesha process consuming nearly 5G memory over a period of time and still growing. Its
system testing with lot FIOs and also mount/unmounts and reconfigure export involved.
So the patches mentioned below are harmless. Sorry about that.
We run ganesha as containers in photon linux and many of these tools cannot be run on
photon.
So we are trying to get our filesystem up in a linux box and run ganesha. Will try mmleak
tool/valgrind on this environment and see if we can reproduce the issue but it’s not easy
task.
In the meantime if you have more information about the connections leak in the backchannel
to clients, please share.
I was reading this thread in archive as well :
Is this something approved by community or just a proposal still ?
Regards,
Deepthi
From: Malahal Naineni <malahal(a)gmail.com>
Date: Friday, 5 June 2020 at 9:29 AM
To: Deepthi Shivaramu <des(a)vmware.com>
Cc: Daniel Gryniewicz <dang(a)redhat.com>, Soumya Koduri <skoduri(a)redhat.com>,
Frank Filz <ffilzlnx(a)mindspring.com>, "devel(a)lists.nfs-ganesha.org"
<devel(a)lists.nfs-ganesha.org>
Subject: Re: [NFS-Ganesha-Devel] Re: segfault in nfs_rpc_destroy_chan while releasing
xprt
I don't see how first commit can cause a leak. We reported it a while back and started
using a patch in our code base way before it went upstream! The second patch is just a
crash, I don't see how that can leak. If I recall correctly, there is a connection
leak in Ganesha initiated connections to clients (statd locally and NLM frank/avail
messages from ganesha to clients) in some circumstances. You can instrument your code or
use
to track. ASAN/valgrind should be able to help with leak detection (don't recall if
netstat -anp |grep ganesha can also help).
Regards, Malahal.
On Fri, Jun 5, 2020 at 12:03 AM Deepthi Shivaramu
<des@vmware.com<mailto:des@vmware.com>> wrote:
Daniel,
I took the below 2 changes to libntirpc on top of our existing libntirpc1.7.0 and
ganesha2.7.2 as per the suggestion in this thread.
But now we are seeing memory leaks in our system testing environment when large number of
shares were being unmounted.
Ganesha seems to be consuming more than 5G memory as per top output.
We are yet to analyse the leak but do we know if these changes in clnt_vc.c in libntirpc
can cause leaks in connection close path?
commit 2d13724606d6391c2cc485d2dbd0555cc6c1bcae
VC - RELEASE after DESTROY
Many error paths call DESTROY, which will unlink and drop the ref. This
means that the final RELEASE will free, causing the DESTROY to
use-after-free. Instead, make sure we DESTROY first.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com<mailto:dang@redhat.com>>
commit c1b95f7519cb3ecbeccdeb69f9d5f534c58383d0
Don't attempt to destroy XPRT if CLNT create was unsuccessful
Currently in clnt_vc_destroy() we call SVC_DESTROY for a XPRT,
but if CLNT (client handle) creation failed then the related
'cx->cx_rec' won't be valid and this will lead to a crash.
Fixed this by calling SVC_DESTROY only when 'cx->cx_rec' is valid.
Signed-off-by: Madhu Thorat
<madhu.punjabi@in.ibm.com<mailto:madhu.punjabi@in.ibm.com>>
Regards,
Deepthi
On 17/05/20, 5:53 PM, "Deepthi Shivaramu"
<des@vmware.com<mailto:des@vmware.com>> wrote:
Thanks Daniel, I will try this.
Regards,
Deepthi
On 15/05/20, 6:55 PM, "Daniel Gryniewicz"
<dang@redhat.com<mailto:dang@redhat.com>> wrote:
Try this one:
Thanks for your response Soumya.
The client used was Ubuntu16.04.2 VM.
This is not seen consistently but we are hitting this randomly in some failure scenarios
for NFSv4.0 alone.
The scenarios were :
1. SetClientId_Confirm op fails with some gss error and when client retries
SetClientid_Confirm op it tries deleting the backchannel and hits this.
2. Second one was randomly on nfs_client_id_expire path.
One thing I wanted clarification was, if the fix for that panic was this or there was
more to the fix?
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
> svc_xprt_lookup - Add extra ref on create
>
> An xprt has a ref for the hash table (that's released by SVC_DESTROY());
> but when it's first created, only 1 ref was taken, so there wasn't a ref
> for the caller.
>
> Add an extra ref for the caller when the xprt is first created.
>
> Signed-off-by: Daniel Gryniewicz
<dang@redhat.com<mailto:dang@redhat.com>>
>next (#155) v3.2
> …
>v1.8.0
> @dang
> dang committed on 19 Oct 2018
> commit ca74cde10ef02a322b8944a6c8639b1318fa34dc
Regards,
Deepthi
On 15/05/20, 1:41 PM, "Soumya Koduri"
<skoduri@redhat.com<mailto:skoduri@redhat.com>> wrote:
Hi Deepthi,
On 5/15/20 8:16 AM, Deepthi Shivaramu wrote:
> Soumya,
> I see there was discussion in github about the exact same segfault and you were
debugging this issue :
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
>
> There were multiple fixes discussed in there but ultimately I see this fix was
checked in :
>
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
>
> But the strange part is I have that fix already in my source and still hitting
this same segfault.
> Also one correction from my previous mail, actually we are using libntirpc1.7.0
with ganesha2.7.2.
>
> @Soumya, do you know any other fix which was related to this problem?
yes. This issue was fixed a while back and we hadn't encountered it
again. Probably Dan may have some insights on it.
Is this consistently hit? What is the client used?
Thanks,
Soumya
>
> Regards,
> Deepthi
>
> On 14/05/20, 5:09 PM, "Deepthi Shivaramu"
<des@vmware.com<mailto:des@vmware.com>> wrote:
>
> I see this segfault is in nfs_rpc_destroy_chan() and not specific to
setclientid_confirm.
> We are not seeing it with NFSv4.1 but seeing it frequently with NFSv4.0
tests.
>
> I saw one more core today with bt:
>
> (gdb) bt
> #0 0x00007ff7b1dde71a in svc_release_it (xprt=0x7ff780001740, flags=0,
tag=0x7ff7b1e05fd0 "clnt_vc_destroy", line=462)
> at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> #1 0x00007ff7b1ddf4fb in clnt_vc_destroy (clnt=0x7ff780001620) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/src/clnt_vc.c:462
> #2 0x000000000043b4e1 in clnt_release_it (clnt=0x7ff780001620, flags=0,
tag=0x55e550 <__func__.21824> "_nfs_rpc_destroy_chan", line=628)
> at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:319
> #3 0x000000000043b577 in clnt_destroy_it (clnt=0x7ff780001620,
tag=0x55e550 <__func__.21824> "_nfs_rpc_destroy_chan", line=628)
> at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:341
> #4 0x000000000043eb97 in _nfs_rpc_destroy_chan (chan=0x7ff7940023a8) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:628
> #5 0x000000000043f800 in nfs_rpc_destroy_chan (chan=0x7ff7940023a8) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> #6 0x00000000004bde35 in nfs_client_id_expire (clientid=0x7ff794002300,
make_stale=false)
> at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_clientid.c:1099
> #7 0x00000000004442bf in reap_hash_table (ht_reap=0xf35f40) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_reaper_thread.c:109
> #8 0x0000000000444a62 in reaper_run (ctx=0xf66ca0) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_reaper_thread.c:232
> #9 0x00000000004fdc38 in fridgethr_start_routine (arg=0xf66ca0) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/support/fridgethr.c:550
> #10 0x00007ff7b09aa3d4 in start_thread (arg=0x7ff791ffb700) at
pthread_create.c:334
> #11 0x00007ff7b02c9ebd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb) f 0
> #0 0x00007ff7b1dde71a in svc_release_it (xprt=0x7ff780001740, flags=0,
tag=0x7ff7b1e05fd0 "clnt_vc_destroy", line=462)
> at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> 433 in
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h
> (gdb) p xprt
> $1 = (SVCXPRT *) 0x7ff780001740
> (gdb) p *$
> $2 = {xp_ops = 0x0, xp_dispatch = {process_cb = 0x0, rendezvous_cb = 0x0},
xp_parent = 0x7ff770004730, xp_tp = 0x6d00000001 <error: Cannot access memory at
address 0x6d00000001>,
> xp_netid = 0x7ff79c00a160 "", xp_p1 = 0x7ff770004750, xp_p2 =
0x0, xp_p3 = 0x0, xp_u1 = 0x3, xp_u2 = 0x0, xp_local = {nb = {maxlen = 0, len = 0, buf =
0x7ff7940018a0}, ss = {
> ss_family = 0, __ss_align = 0, __ss_padding = '\000'
<repeats 111 times>}}, xp_remote = {nb = {maxlen = 4280583506, len = 0, buf = 0x0},
ss = {ss_family = 34467,
> __ss_align = 1,
> __ss_padding =
"_:P\346ju\200\223\001\000\000\000\001\000\000\000`,\000\200\367\177\000\000\341\376\266^",
'\000' <repeats 12 times>,
"\061\000\000\000\000\000\000\000\000\061\000\200\367\177\000\000\220]\000\200\367\177\000\000c3-edbe-2fea12000\000\000\000\000\000\000\000\064\001",
'\000' <repeats 21 times>}}, xp_lock = {__data = {__lock = -1946148624,
> __count = 32759, __owner = 0, __nusers = 37, __kind = -1946148624,
__spins = 32759, __list = {__prev = 0x7ff77c001530, __next = 0x0}},
> __size = "\360
\000\214\367\177\000\000\000\000\000\000%\000\000\000\360
\000\214\367\177\000\000\060\025\000|\367\177\000\000\000\000\000\000\000\000\000",
> __align = 140701182468336}, xp_fd = 0, xp_ifindex = 0, xp_si_type = 3,
xp_type = 0, xp_refcnt = -1, xp_flags = 64}
> (gdb) p xprt->xp_ops
> $3 = (struct xp_ops *) 0x0
> (gdb)
>
>
> Regards,
> Deepthi
>
> On 14/05/20, 12:17 PM, "Deepthi Shivaramu"
<des@vmware.com<mailto:des@vmware.com>> wrote:
>
> Daniel,
> I am seeing this segfault in the libntirpc1.8.0 with ganesha2.8.2 in
setclientid_confirm code path.
> Can you please check and let me know if you have seen this issue
before and if the fix is already available in latest versions?
>
>
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x00007fd66badf72e in svc_release_it (xprt=0x7fd658002e90,
flags=0,
> tag=0x7fd66bb06fd0 "clnt_vc_destroy", line=462)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> #2 0x00007fd66bae04fb in clnt_vc_destroy (clnt=0x7fd658002ba0)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/src/clnt_vc.c:462
> #3 0x000000000043b4e1 in clnt_release_it (clnt=0x7fd658002ba0,
flags=0,
> tag=0x55e550 <__func__.21824> "_nfs_rpc_destroy_chan",
line=628)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:319
> #4 0x000000000043b577 in clnt_destroy_it (clnt=0x7fd658002ba0,
> tag=0x55e550 <__func__.21824> "_nfs_rpc_destroy_chan",
line=628)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:341
> #5 0x000000000043eb97 in _nfs_rpc_destroy_chan (chan=0x7fd64c002648)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:628
> #6 0x000000000043f800 in nfs_rpc_destroy_chan (chan=0x7fd64c002648)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> #7 0x000000000048011c in nfs4_op_setclientid_confirm
(op=0x7fd62c001d90,
> ---Type <return> to continue, or q <return> to quit---
> data=0x7fd6607dff70, resp=0x7fd62c002070)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/Protocols/NFS/nfs4_op_setclientid_confirm.c:382
> #8 0x000000000045b4b1 in nfs4_Compound (arg=0x7fd62c0011a8,
> req=0x7fd62c000aa0, res=0x7fd62c001f60)
> at
> ....
> .......
> #20 0x00007fd669fcaebd in clone ()
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb) f 1
> #1 0x00007fd66badf72e in svc_release_it (xprt=0x7fd658002e90,
flags=0,
> tag=0x7fd66bb06fd0 "clnt_vc_destroy", line=462)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> 433
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:
> No such file or directory.
> (gdb) p clnt
> No symbol "clnt" in current context.
> (gdb) p xprt
> $10 = (SVCXPRT *) 0x7fd658002e90
> (gdb) p *$
> $11 = {xp_ops = 0x7fd658000e20, xp_dispatch = {process_cb =
0x7fd658000078,
> rendezvous_cb = 0x7fd658000078}, xp_parent = 0x0, xp_tp = 0x0,
> xp_netid = 0x0, xp_p1 = 0x0, xp_p2 = 0x0, xp_p3 = 0x0, xp_u1 = 0x0,
> xp_u2 = 0x0, xp_local = {nb = {maxlen = 483619223, len = 1, buf =
0x2},
> ss = {ss_family = 0, __ss_align = 0,
> __ss_padding =
>
"\313)\260k\326\177\000\000\020\320\236b\326\177\000\000\006\000\000\000\034\000\000\000\004\004\005\377\377\377\377\377\000\000\000\000\020\373\364\310\333c\335\363\245\332\362b\324.M\332",
> '\000' <repeats 59 times>}}, xp_remote = {nb = {maxlen =
0, len = 0, buf =
> 0x0}, ss = {ss_family = 0,
> __ss_align = 0, __ss_padding = '\000' <repeats 111
times>}}, xp_lock =
> {
> __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind =
0,
> __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
> __size = '\000' <repeats 39 times>, __align = 0}, xp_fd
= 0,
> xp_ifindex = 0, xp_si_type = 0, xp_type = 0, xp_refcnt = -1, xp_flags
= 64}
> (gdb) f 6
> #6 0x000000000043f800 in nfs_rpc_destroy_chan (chan=0x7fd64c002648)
> at
>
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> 864
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:
> No such file or directory.
> (gdb) p chan
> $12 = (rpc_call_channel_t *) 0x7fd64c002648
> (gdb) p *$
> $13 = {type = RPC_CHAN_V40, mtx = {__data = {__lock = 1, __count =
0,
> __owner = 163, __nusers = 1, __kind = 0, __spins = 0, __list = {
> __prev = 0x0, __next = 0x0}},
> __size =
"\001\000\000\000\000\000\000\000\243\000\000\000\001",
> '\000' <repeats 26 times>, __align = 1}, states = 0,
source = {clientid =
> 0x7fd64c0025a0,
> session = 0x7fd64c0025a0}, last_called = 0, clnt = 0x7fd658002ba0,
> auth = 0x0, gss_sec = {mech = 0x0, qop = 0, svc =
RPCSEC_GSS_SVC_INTEGRITY,
> cred = 0x0, req_flags = 0}}
> (gdb) p chan->client
> There is no member named client.
> (gdb) p chan->clnt
> $14 = (CLIENT *) 0x7fd658002ba0
> (gdb) p *$
> $15 = {cl_ops = 0x7fd66bd192e0, cl_netid = 0x0, cl_tp = 0x0, cl_u1 =
0x0,
> cl_u2 = 0x0, cl_lock = {__data = {__lock = 0, __count = 0, __owner =
0,
> __nusers = 0, __kind = 3, __spins = 0, __list = {__prev = 0x0,
> __next = 0x0}},
> __size = '\000' <repeats 16 times>, "\003",
'\000'
> <repeats 22 times>,
> __align = 0}, cl_error = {ru = {RE_errno = 0, RE_why = AUTH_OK,
RE_vers = {
> low = 0, high = 0}, RE_lb = {s1 = 0, s2 = 0}},
> re_status = RPC_SUCCESS}, cl_refcnt = 0, cl_flags = 96}
> (gdb)
>
> On 06/05/20, 10:00 PM, "Daniel Gryniewicz"
<dang@redhat.com<mailto:dang@redhat.com>> wrote:
>
> I'm happy to announce the latest stable versions of NTIRPC and
Ganesha
> in the 2.8 series. These are NTIRPC 1.8.1 and Ganesha 2.8.4.
There are
> >40 bug fixes in these releases.
>
> Daniel
> _______________________________________________
> Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>
>
>
> _______________________________________________
> Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>
>
>
>
>
> _______________________________________________
> Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>
>
_______________________________________________
Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>
_______________________________________________
Devel mailing list --
devel@lists.nfs-ganesha.org<mailto:devel@lists.nfs-ganesha.org>
To unsubscribe send an email to
devel-leave@lists.nfs-ganesha.org<mailto:devel-leave@lists.nfs-ganesha.org>