Ok, I've got a patch that serializes use of a slot so a quickly replayed request will
block until the first instance of the request is completed and there is a response to
replay:
-----Original Message-----
From: Frank Filz [mailto:ffilzlnx@mindspring.com]
Sent: Monday, April 23, 2018 6:58 AM
To: dang(a)redhat.com; tomkcpr(a)mdevsys.com; support(a)lists.nfs-ganesha.org
Cc: nfs-ganesha-support(a)lists.sourceforge.net
Subject: [Nfs-ganesha-support] Re: [Support] ERR 20: Auth Rejected Credentials
(client should begin new session)
> Thanks. I'll try to dig further into this, but for some strange
> reason, a compound with <SEQUENCE, PUTROOTFH, GETFH, GETATTRS> is
> responded to with <SEQUENCE, PUTROOTFH, SECINFO_NO_NAME>, which
causes
> the session to be destroyed. A quick look doesn't show me any way
> that can happen (nor does a quick read of the spec indicate it's a legal
response).
If you filter on the session ID and slot 0, you will see 4 request response pairs:
196 24.035045 192.168.0.108 192.168.0.80 NFS 270 V4 Call
(Reply In 197) RECLAIM_COMPLETE
GSS Data, Ops(2): SEQUENCE, RECLAIM_COMPLETE
Operations (count: 2): SEQUENCE, RECLAIM_COMPLETE
Opcode: SEQUENCE (53)
seqid: 0x00000001
Opcode: RECLAIM_COMPLETE (58)
197 24.035695 192.168.0.80 192.168.0.108 NFS 226 V4 Reply
(Call In 196) RECLAIM_COMPLETE
GSS Data, Ops(2): SEQUENCE RECLAIM_COMPLETE
Operations (count: 2)
Opcode: SEQUENCE (53)
seqid: 0x00000001
Opcode: RECLAIM_COMPLETE (58)
198 24.035842 192.168.0.108 192.168.0.80 NFS 274 V4 Call
(Reply In 199) SECINFO_NO_NAME
GSS Data, Ops(3): SEQUENCE, PUTROOTFH, SECINFO_NO_NAME
Operations (count: 3): SEQUENCE, PUTROOTFH, SECINFO_NO_NAME
Opcode: SEQUENCE (53)
seqid: 0x00000002
Opcode: PUTROOTFH (24)
Opcode: SECINFO_NO_NAME (52)
199 24.041264 192.168.0.80 192.168.0.108 NFS 330 V4 Reply
(Call In 198) SECINFO_NO_NAME
GSS Data, Ops(3): SEQUENCE PUTROOTFH SECINFO_NO_NAME
Operations (count: 3)
Opcode: SEQUENCE (53)
seqid: 0x00000002
Opcode: PUTROOTFH (24)
Opcode: SECINFO_NO_NAME (52)
536 94.470108 192.168.0.108 192.168.0.80 NFS 226 V4 Call
(Reply In 550) PUTROOTFH | GETATTR
Network File System, Ops(4): SEQUENCE, PUTROOTFH, GETFH, GETATTR
Operations (count: 4): SEQUENCE, PUTROOTFH, GETFH, GETATTR
Opcode: SEQUENCE (53)
seqid: 0x00000003
Opcode: PUTROOTFH (24)
Opcode: GETFH (10)
Opcode: GETATTR (9)
537 94.470249 192.168.0.108 192.168.0.80 NFS 182 V4 Call
(Reply In 542) PUTROOTFH | GETATTR
Network File System, Ops(4): SEQUENCE, PUTROOTFH, GETFH, GETATTR
Operations (count: 4): SEQUENCE, PUTROOTFH, GETFH, GETATTR
Opcode: SEQUENCE (53)
seqid: 0x00000003
Opcode: PUTROOTFH (24)
Opcode: GETFH (10)
Opcode: GETATTR (9)
542 94.471061 192.168.0.80 192.168.0.108 NFS 262 V4 Reply
(Call In 537) SECINFO_NO_NAME
Network File System, Ops(3): SEQUENCE PUTROOTFH SECINFO_NO_NAME
Operations (count: 3)
Opcode: SEQUENCE (53)
seqid: 0x00000002
Opcode: PUTROOTFH (24)
Opcode: SECINFO_NO_NAME (52)
550 94.471464 192.168.0.80 192.168.0.108 NFS 474 V4 Reply
(Call In 536) PUTROOTFH | GETATTR
Network File System, Ops(4): SEQUENCE PUTROOTFH GETFH GETATTR
Operations (count: 4)
Opcode: SEQUENCE (53)
seqid: 0x00000003
Opcode: PUTROOTFH (24)
Opcode: GETFH (10)
Opcode: GETATTR (9)
Notice we get two copies of SEQUENCE, PUTROOTFH, GETFH, GETATTR at seqid
3 in fast succession. And then two responses, one of which looks correct, one of
which is a replay of the response to SEQUENCE PUTROOTFH
SECINFO_NO_NAME at seqid 2.
I think what is happening is Ganesha is starting to process both seqid requests in
parallel. One of them is identified as a replay and responded to as such.
Unfortunately, the one being processed as an original request has not actually
finished, so what is in the cache slot for the replay is the seqid 2 response...
Frank
> Frank, any ideas?
>
> Daniel
>
> On 04/22/2018 12:39 AM, TomK wrote:
> > On 4/20/2018 2:31 PM, Daniel Gryniewicz wrote:
> > Thanks very much Daniel.
> >
> > Attaching the tcpdump.
> >
> > What I used:
> >
> > 1) tcpdump -w nfs-trace.dat -s 0
> > 2) tcpdump -r nfs-trace.dat -nnvvveXXS
> >
> > Cheers,
> > Tom
> >
> >> Hi, Tom.
> >>
> >> Sorry I missed this when it came in; the new list was not being
> >> filtered properly.
> >>
> >> Rejected Creds is only returned in a few places. It can mean that
> >> the requested Auth type is completely unknown. This seems
> >> unlikely, as Ganesha should understand all auth types used by the linux
kernel client.
> >>
> >> It can mean that AUTH_SHORT was used (which Ganesha doesn't
support...).
> >>
> >> It can mean that AUTH_GSS was used, but control procedure is unknown.
> >> Again, unlikely.
> >>
> >> It can mean that AUTH_GSS was used with GSS_INIT, but the security
> >> context was invalid or incorrect.
> >>
> >> Unfortunately, tcpdump didn't print enough for us to know. Could
> >> you get a raw packet capture that we could load into wireshark? Or
> >> load it yourself, and get a detailed trace of the failed packet?
> >>
> >> Thanks,
> >> Daniel
> >>
> >> On 04/19/2018 03:33 AM, TomK wrote:
> >>> Hey All,
> >>>
> >>> I have an external NFS cluster serviced by a VIP. The clients run
> >>> autofs configured via IPA to provide NFS home directories to client.
> >>>
> >>> However, running into an issue on one of the clients and wondering
> >>> if anyone seen this message from a tcpdump of a simple mount
> >>> session that's preventing the mount:
> >>>
> >>> psql02: mount nfs-c01:/n /m
> >>>
> >>> Yields this message
> >>>
> >>> ERR 20: Auth Rejected Credentials (client should begin new
> >>> session)
> >>>
> >>> and the mount attempt never exits and never mounts /m . nfs-c01
> >>> is a VIP that's serviced by HAproxy / keepalived. nfs-c01 however
> >>> has a record in IPA Server, both forward and a reverse one. Using
> >>> one of the underlying hosts that services nfs-c01 works and mounts
> >>> succeeds for them. All VM hosts are clones of the same template.
> >>>
> >>> I have autofs running as part of this IPA client setup and applied
> >>> the following fix as well:
> >>>
> >>>
https://access.redhat.com/solutions/3261981
> >>>
> >>> /m is a test mount folder I'm using on this client to troubleshoot
> >>> the autofs mounting issue. So autofs is also running on the same
> >>> hosts where I'm trying this mount from.
> >>>
> >>> Trying to trace the exact source of this error and not quite sure
> >>> where to look further.
> >>>
> >>> idmipa01/02 are the IPA servers. (192.168.0.44/45 respectively)
> >>> psql01/02 are the problem VM's. (192.168.0.108/124 )
> >>> nfs01/02 are the NFS hosts. (192.168.0.131/119 )
> >>> nfs-c01 192.168.0.80
> >>>
> >>> This works fine on the other two VM hosts without any issue but I
> >>> just can't find any difference comparing all the configs and so
> >>> looking for suggestions to bounce off of.
> >>>
> >>
> >
> >
> _______________________________________________
> Support mailing list -- support(a)lists.nfs-ganesha.org To unsubscribe
> send an email to support-leave(a)lists.nfs-ganesha.org
_______________________________________________
Support mailing list -- support(a)lists.nfs-ganesha.org To unsubscribe send an
email to support-leave(a)lists.nfs-ganesha.org