This gives me several things to try! I will report back.
THANK YOU SO MUCH for taking the time to look at packet traces, which I
assume may not be the most exciting thing you've done this week.
If I can find something useful to test in the lab here by the end of the
day, I think I can still get a dedicated time on the 2592 node ARM
cluster to try performance runs this weekend. So we have a ray of hope
here. That machine goes in to a mode where I can't even request
dedicated time after this month; this may be practically my last chance.
I was thinking a work around -- not sure if I have the talent to do it -
might be to return an empty list instead of an error to get me through
this weekend and then try to dig in to why this behavior is differently
with x86_64 vs aarch64, which is probably a linux kernel community
dicussion.
Let me see where I get with your suggestions, and research some client
side mount options relating to this. Maybe noacl-like things
client-side.
More soon!
On Wed, Aug 14, 2019 at 08:58:20AM -0400, Daniel Gryniewicz wrote:
On 8/14/19 12:54 AM, Soumya Koduri wrote:
>
>
> On 8/14/19 2:35 AM, Erik Jacobson wrote:
> > > The only thing I can see is that it tries to open /dev/tty and fails.
> > > However, that's very early in the process, and it continues for
> > > a long time
> > > after that, so I'm not sure that is causing it to fail. It
> > > never gets to
> > > trying to lookup /home/erikj, it just stops at /home.
> > >
> > > I'm a bit stumped at this point. Maybe a packet trace of a
> > > working run of
> > > the same thing would be helpful for comparison?
> >
> > Hello - Meetings (... bleh)
> >
> > I re-ran the test using rhel76 - same image as before, same test as
> > before same node as before.
> >
> > I nothing in the gNFS nfs log (but I didn't enable anything verbose
> > there).
> >
> > It worked fine to do the "su - erikj" test as we expected with gNFS.
> >
> > I have attached a capture to this email. I hope it helps.
> >
> > I'm starting to look through the output myself but I won't be much use
> > at spotting something I'm afraid.
> >
> > Let me know if you see something in this working case that could be a
> > clue! Thank you !!!!
>
> One thing I noticed from this gNFS pkt trace is that after LOOKUP on
> "/home", client immediately sent "GETACL" call. Maybe archlinux
needs
> ACL support for non-root users (just a guess) and as NFS-Ganesha does
> not support NFSACL program (it sent "program not supported error to
> NFSACL NULL call), it gave "Operation not supported" error.
>
> NFS-Ganesha supports only NFSv4.x ACLs. Could you re-enable ACLs (global
> and export level) in the ganesha config and try using NFSv4.x protocol?
>
Possible, although the successful pcap has NFSACL calls before the lookup of
/home, so that may not be the case. But you're right that the next thing in
the successful pcap is an NFSACL, and the ACCESS call only comes after that,
and the ACCESS call never occurs on the failed pcap.
Note that the NFSACL calls in the gnfs case always return empty ACLs, so the
ACLs aren't being used for anything.
Daniel
Erik Jacobson
Software Engineer
erik.jacobson(a)hpe.com
+1 612 851 0550 Office
Eagan, MN
hpe.com