I don't see how first commit can cause a leak. We reported it a while back
and started using a patch in our code base way before it went upstream! The
second patch is just a crash, I don't see how that can leak. If I recall
correctly, there is a connection leak in Ganesha initiated connections to
clients (statd locally and NLM frank/avail messages from ganesha to
clients) in some circumstances. You can instrument your code or use
to
track. ASAN/valgrind should be able to help with leak detection (don't
recall if netstat -anp |grep ganesha can also help).
Regards, Malahal.
On Fri, Jun 5, 2020 at 12:03 AM Deepthi Shivaramu <des(a)vmware.com> wrote:
Daniel,
I took the below 2 changes to libntirpc on top of our existing
libntirpc1.7.0 and ganesha2.7.2 as per the suggestion in this thread.
But now we are seeing memory leaks in our system testing environment when
large number of shares were being unmounted.
Ganesha seems to be consuming more than 5G memory as per top output.
We are yet to analyse the leak but do we know if these changes in
clnt_vc.c in libntirpc can cause leaks in connection close path?
commit 2d13724606d6391c2cc485d2dbd0555cc6c1bcae
VC - RELEASE after DESTROY
Many error paths call DESTROY, which will unlink and drop the ref. This
means that the final RELEASE will free, causing the DESTROY to
use-after-free. Instead, make sure we DESTROY first.
Signed-off-by: Daniel Gryniewicz <dang(a)redhat.com>
commit c1b95f7519cb3ecbeccdeb69f9d5f534c58383d0
Don't attempt to destroy XPRT if CLNT create was unsuccessful
Currently in clnt_vc_destroy() we call SVC_DESTROY for a XPRT,
but if CLNT (client handle) creation failed then the related
'cx->cx_rec' won't be valid and this will lead to a crash.
Fixed this by calling SVC_DESTROY only when 'cx->cx_rec' is valid.
Signed-off-by: Madhu Thorat <madhu.punjabi(a)in.ibm.com>
Regards,
Deepthi
On 17/05/20, 5:53 PM, "Deepthi Shivaramu" <des(a)vmware.com> wrote:
Thanks Daniel, I will try this.
Regards,
Deepthi
On 15/05/20, 6:55 PM, "Daniel Gryniewicz" <dang(a)redhat.com> wrote:
Try this one:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
Daniel
On 5/15/20 6:55 AM, Deepthi Shivaramu wrote:
> Thanks for your response Soumya.
>
> The client used was Ubuntu16.04.2 VM.
> This is not seen consistently but we are hitting this randomly
in some failure scenarios for NFSv4.0 alone.
>
> The scenarios were :
> 1. SetClientId_Confirm op fails with some gss error and when
client retries SetClientid_Confirm op it tries deleting the backchannel and
hits this.
> 2. Second one was randomly on nfs_client_id_expire path.
>
> One thing I wanted clarification was, if the fix for that panic
was this or there was more to the fix?
>
> >
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
>> svc_xprt_lookup - Add extra ref on create
>>
>> An xprt has a ref for the hash table (that's released by
SVC_DESTROY());
>> but when it's first created, only 1 ref was taken, so there
wasn't a ref
>> for the caller.
>>
>> Add an extra ref for the caller when the xprt is first created.
>>
>> Signed-off-by: Daniel Gryniewicz <dang(a)redhat.com>
> >next (#155) v3.2
>> …
> >v1.8.0
>> @dang
>> dang committed on 19 Oct 2018
>> commit ca74cde10ef02a322b8944a6c8639b1318fa34dc
>
> Regards,
> Deepthi
>
> On 15/05/20, 1:41 PM, "Soumya Koduri" <skoduri(a)redhat.com>
wrote:
>
> Hi Deepthi,
>
>
> On 5/15/20 8:16 AM, Deepthi Shivaramu wrote:
> > Soumya,
> > I see there was discussion in github about the exact same
segfault and you were debugging this issue :
> >
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
> >
> > There were multiple fixes discussed in there but
ultimately I see this fix was checked in :
> >
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub....
> >
> > But the strange part is I have that fix already in my
source and still hitting this same segfault.
> > Also one correction from my previous mail, actually we
are using libntirpc1.7.0 with ganesha2.7.2.
> >
> > @Soumya, do you know any other fix which was related to
this problem?
>
> yes. This issue was fixed a while back and we hadn't
encountered it
> again. Probably Dan may have some insights on it.
>
> Is this consistently hit? What is the client used?
>
> Thanks,
> Soumya
>
> >
> > Regards,
> > Deepthi
> >
> > On 14/05/20, 5:09 PM, "Deepthi Shivaramu"
<des(a)vmware.com>
wrote:
> >
> > I see this segfault is in nfs_rpc_destroy_chan() and
not specific to setclientid_confirm.
> > We are not seeing it with NFSv4.1 but seeing it
frequently with NFSv4.0 tests.
> >
> > I saw one more core today with bt:
> >
> > (gdb) bt
> > #0 0x00007ff7b1dde71a in svc_release_it
(xprt=0x7ff780001740, flags=0, tag=0x7ff7b1e05fd0 "clnt_vc_destroy",
line=462)
> > at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> > #1 0x00007ff7b1ddf4fb in clnt_vc_destroy
(clnt=0x7ff780001620) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/src/clnt_vc.c:462
> > #2 0x000000000043b4e1 in clnt_release_it
(clnt=0x7ff780001620, flags=0, tag=0x55e550 <__func__.21824>
"_nfs_rpc_destroy_chan", line=628)
> > at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:319
> > #3 0x000000000043b577 in clnt_destroy_it
(clnt=0x7ff780001620, tag=0x55e550 <__func__.21824>
"_nfs_rpc_destroy_chan", line=628)
> > at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:341
> > #4 0x000000000043eb97 in _nfs_rpc_destroy_chan
(chan=0x7ff7940023a8) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:628
> > #5 0x000000000043f800 in nfs_rpc_destroy_chan
(chan=0x7ff7940023a8) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> > #6 0x00000000004bde35 in nfs_client_id_expire
(clientid=0x7ff794002300, make_stale=false)
> > at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/SAL/nfs4_clientid.c:1099
> > #7 0x00000000004442bf in reap_hash_table
(ht_reap=0xf35f40) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_reaper_thread.c:109
> > #8 0x0000000000444a62 in reaper_run (ctx=0xf66ca0)
at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_reaper_thread.c:232
> > #9 0x00000000004fdc38 in fridgethr_start_routine
(arg=0xf66ca0) at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/support/fridgethr.c:550
> > #10 0x00007ff7b09aa3d4 in start_thread
(arg=0x7ff791ffb700) at pthread_create.c:334
> > #11 0x00007ff7b02c9ebd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > (gdb) f 0
> > #0 0x00007ff7b1dde71a in svc_release_it
(xprt=0x7ff780001740, flags=0, tag=0x7ff7b1e05fd0 "clnt_vc_destroy",
line=462)
> > at
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> > 433 in
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h
> > (gdb) p xprt
> > $1 = (SVCXPRT *) 0x7ff780001740
> > (gdb) p *$
> > $2 = {xp_ops = 0x0, xp_dispatch = {process_cb = 0x0,
rendezvous_cb = 0x0}, xp_parent = 0x7ff770004730, xp_tp = 0x6d00000001
<error: Cannot access memory at address 0x6d00000001>,
> > xp_netid = 0x7ff79c00a160 "", xp_p1 =
0x7ff770004750, xp_p2 = 0x0, xp_p3 = 0x0, xp_u1 = 0x3, xp_u2 = 0x0,
xp_local = {nb = {maxlen = 0, len = 0, buf = 0x7ff7940018a0}, ss = {
> > ss_family = 0, __ss_align = 0, __ss_padding =
'\000' <repeats 111 times>}}, xp_remote = {nb = {maxlen = 4280583506, len
=
0, buf = 0x0}, ss = {ss_family = 34467,
> > __ss_align = 1,
> > __ss_padding =
"_:P\346ju\200\223\001\000\000\000\001\000\000\000`,\000\200\367\177\000\000\341\376\266^",
'\000' <repeats 12 times>,
"\061\000\000\000\000\000\000\000\000\061\000\200\367\177\000\000\220]\000\200\367\177\000\000c3-edbe-2fea12000\000\000\000\000\000\000\000\064\001",
'\000' <repeats 21 times>}}, xp_lock = {__data = {__lock = -1946148624,
> > __count = 32759, __owner = 0, __nusers = 37,
__kind = -1946148624, __spins = 32759, __list = {__prev = 0x7ff77c001530,
__next = 0x0}},
> > __size = "\360
\000\214\367\177\000\000\000\000\000\000%\000\000\000\360
\000\214\367\177\000\000\060\025\000|\367\177\000\000\000\000\000\000\000\000\000",
> > __align = 140701182468336}, xp_fd = 0,
xp_ifindex = 0, xp_si_type = 3, xp_type = 0, xp_refcnt = -1, xp_flags = 64}
> > (gdb) p xprt->xp_ops
> > $3 = (struct xp_ops *) 0x0
> > (gdb)
> >
> >
> > Regards,
> > Deepthi
> >
> > On 14/05/20, 12:17 PM, "Deepthi Shivaramu" <
des(a)vmware.com> wrote:
> >
> > Daniel,
> > I am seeing this segfault in the libntirpc1.8.0
with ganesha2.8.2 in setclientid_confirm code path.
> > Can you please check and let me know if you have
seen this issue before and if the fix is already available in latest
versions?
> >
> >
> > (gdb) bt
> > #0 0x0000000000000000 in ?? ()
> > #1 0x00007fd66badf72e in svc_release_it
(xprt=0x7fd658002e90, flags=0,
> > tag=0x7fd66bb06fd0 "clnt_vc_destroy",
line=462)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> > #2 0x00007fd66bae04fb in clnt_vc_destroy
(clnt=0x7fd658002ba0)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/src/clnt_vc.c:462
> > #3 0x000000000043b4e1 in clnt_release_it
(clnt=0x7fd658002ba0, flags=0,
> > tag=0x55e550 <__func__.21824>
"_nfs_rpc_destroy_chan", line=628)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:319
> > #4 0x000000000043b577 in clnt_destroy_it
(clnt=0x7fd658002ba0,
> > tag=0x55e550 <__func__.21824>
"_nfs_rpc_destroy_chan", line=628)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/clnt.h:341
> > #5 0x000000000043eb97 in _nfs_rpc_destroy_chan
(chan=0x7fd64c002648)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:628
> > #6 0x000000000043f800 in nfs_rpc_destroy_chan
(chan=0x7fd64c002648)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> > #7 0x000000000048011c in
nfs4_op_setclientid_confirm (op=0x7fd62c001d90,
> > ---Type <return> to continue, or q <return>
to
quit---
> > data=0x7fd6607dff70, resp=0x7fd62c002070)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/Protocols/NFS/nfs4_op_setclientid_confirm.c:382
> > #8 0x000000000045b4b1 in nfs4_Compound
(arg=0x7fd62c0011a8,
> > req=0x7fd62c000aa0, res=0x7fd62c001f60)
> > at
> > ....
> > .......
> > #20 0x00007fd669fcaebd in clone ()
> > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> > (gdb) f 1
> > #1 0x00007fd66badf72e in svc_release_it
(xprt=0x7fd658002e90, flags=0,
> > tag=0x7fd66bb06fd0 "clnt_vc_destroy",
line=462)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:433
> > 433
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/libntirpc/ntirpc/rpc/svc.h:
> > No such file or directory.
> > (gdb) p clnt
> > No symbol "clnt" in current context.
> > (gdb) p xprt
> > $10 = (SVCXPRT *) 0x7fd658002e90
> > (gdb) p *$
> > $11 = {xp_ops = 0x7fd658000e20, xp_dispatch =
{process_cb = 0x7fd658000078,
> > rendezvous_cb = 0x7fd658000078}, xp_parent =
0x0, xp_tp = 0x0,
> > xp_netid = 0x0, xp_p1 = 0x0, xp_p2 = 0x0, xp_p3
= 0x0, xp_u1 = 0x0,
> > xp_u2 = 0x0, xp_local = {nb = {maxlen =
483619223, len = 1, buf = 0x2},
> > ss = {ss_family = 0, __ss_align = 0,
> > __ss_padding =
> >
"\313)\260k\326\177\000\000\020\320\236b\326\177\000\000\006\000\000\000\034\000\000\000\004\004\005\377\377\377\377\377\000\000\000\000\020\373\364\310\333c\335\363\245\332\362b\324.M\332",
> > '\000' <repeats 59 times>}}, xp_remote =
{nb =
{maxlen = 0, len = 0, buf =
> > 0x0}, ss = {ss_family = 0,
> > __ss_align = 0, __ss_padding = '\000'
<repeats
111 times>}}, xp_lock =
> > {
> > __data = {__lock = 0, __count = 0, __owner = 0,
__nusers = 0, __kind = 0,
> > __spins = 0, __list = {__prev = 0x0, __next =
0x0}},
> > __size = '\000' <repeats 39 times>, __align
=
0}, xp_fd = 0,
> > xp_ifindex = 0, xp_si_type = 0, xp_type = 0,
xp_refcnt = -1, xp_flags = 64}
> > (gdb) f 6
> > #6 0x000000000043f800 in nfs_rpc_destroy_chan
(chan=0x7fd64c002648)
> > at
> >
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:864
> > 864
/build/mts/release/bora-16138726/cayman_nfs-ganesha/nfs-ganesha/src/src/MainNFSD/nfs_rpc_callback.c:
> > No such file or directory.
> > (gdb) p chan
> > $12 = (rpc_call_channel_t *) 0x7fd64c002648
> > (gdb) p *$
> > $13 = {type = RPC_CHAN_V40, mtx = {__data =
{__lock = 1, __count = 0,
> > __owner = 163, __nusers = 1, __kind = 0, __spins
= 0, __list = {
> > __prev = 0x0, __next = 0x0}},
> > __size =
"\001\000\000\000\000\000\000\000\243\000\000\000\001",
> > '\000' <repeats 26 times>, __align = 1},
states
= 0, source = {clientid =
> > 0x7fd64c0025a0,
> > session = 0x7fd64c0025a0}, last_called = 0, clnt
= 0x7fd658002ba0,
> > auth = 0x0, gss_sec = {mech = 0x0, qop = 0, svc
= RPCSEC_GSS_SVC_INTEGRITY,
> > cred = 0x0, req_flags = 0}}
> > (gdb) p chan->client
> > There is no member named client.
> > (gdb) p chan->clnt
> > $14 = (CLIENT *) 0x7fd658002ba0
> > (gdb) p *$
> > $15 = {cl_ops = 0x7fd66bd192e0, cl_netid = 0x0,
cl_tp = 0x0, cl_u1 = 0x0,
> > cl_u2 = 0x0, cl_lock = {__data = {__lock = 0,
__count = 0, __owner = 0,
> > __nusers = 0, __kind = 3, __spins = 0, __list =
{__prev = 0x0,
> > __next = 0x0}},
> > __size = '\000' <repeats 16 times>,
"\003",
'\000'
> > <repeats 22 times>,
> > __align = 0}, cl_error = {ru = {RE_errno = 0,
RE_why = AUTH_OK, RE_vers = {
> > low = 0, high = 0}, RE_lb = {s1 = 0, s2 = 0}},
> > re_status = RPC_SUCCESS}, cl_refcnt = 0,
cl_flags = 96}
> > (gdb)
> >
> > On 06/05/20, 10:00 PM, "Daniel Gryniewicz"
<
dang(a)redhat.com> wrote:
> >
> > I'm happy to announce the latest stable
versions of NTIRPC and Ganesha
> > in the 2.8 series. These are NTIRPC 1.8.1
and Ganesha 2.8.4. There are
> > >40 bug fixes in these releases.
> >
> > Daniel
> >
_______________________________________________
> > Devel mailing list --
devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
> >
> >
> > _______________________________________________
> > Devel mailing list --
devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
> >
> >
> >
> >
> > _______________________________________________
> > Devel mailing list -- devel(a)lists.nfs-ganesha.org
> > To unsubscribe send an email to
devel-leave(a)lists.nfs-ganesha.org
> >
>
>
>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org