I pasted some info below in case it's interesting of tests I did so far.
My nfsv4 question is as follows.
I chroot in like before with the simple test case.
(if noacl is on the client side, there is a long pause, otherwise
fast)
I get:
sh-4.2# su - erikj
su: cannot set groups: Operation not permitted
Which made curious, so I did this:
sh-4.2# ls -l /etc/securetty
-rw------- 1 4294967294 4294967294 221 Jun 21 2018 /etc/securetty
Then outside the chroot, I checked too:
bash-4.2# ls -l /a/etc/securetty
-rw------- 1 4294967294 4294967294 221 Jun 21 2018 /a/etc/securetty
Here is what it is on the nfs server side:
# ls -l /opt/clmgr/image/images_ro_nfs/rhel76-aarch64-newkernel/etc/securetty
-rw------- 1 root root 221 Jun 21 2018
/opt/clmgr/image/images_ro_nfs/rhel76-aarch64-newkernel/etc/securetty
I'm not sure exactly what Manage_Gids does but it made it so the
su resulted in permission denied for libraries.
Not sure exactly what it does, but I tried (in NFS_CORE_PARAM):
Only_Numeric_Owners = TRUE
I also tried, for fun:
Allow_Numeric_Owners = FALSE;
And it didn't change in either case.
Squash is 'none'.
I didn't set any anonymous uid stuff.
So my NFS v4 experiment is having some issues.
I'm sorry to come off so needy here. Thanks again for the help so far.
Here is the current test case from miniroot (fat initrd):
umount /a
umount /root_ro_nfs
umount /rootfs.rw
#####mount -o ro,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel /root_ro_nfs
mount -o ro,nolock 172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel
/root_ro_nfs
mount -t tmpfs -o mpol=interleave tmpfs /rootfs.rw
mkdir /rootfs.rw/upperdir
mkdir /rootfs.rw/work
mount -t overlay overlay -o
lowerdir=/root_ro_nfs,upperdir=/rootfs.rw/upperdir,workdir=/rootfs.rw/work /a
Some other cases tried:
- client side vers=3 "noacl" mount option doesn't change the behavior.
- no "noacl" is availble for tmpfs or overlay.
- nfs v4.1 on the client - noacl in mount comand - leaving ACLs disabled on
the server, does see a behavior change but it still fails:
sh-4.2# su - erikj
su: cannot set groups: Operation not permitted
- Same as above, but removing 'noacl' from nfsv4 mount line -- same
behavior.
- nfsv41 client, ganesha ACLs enabled (export and global)
* su - erikj takes a long long time
* then fails with
su: cannot set groups: Operation not permitted
- like above, but adding 'noacl' to client mode - the long pause doens't
happen but still ends up with:
(on another trial the long pause happened)
su: cannot set groups: Operation not permitted
On Wed, Aug 14, 2019 at 08:52:57AM -0500, Erik Jacobson wrote:
This gives me several things to try! I will report back.
THANK YOU SO MUCH for taking the time to look at packet traces, which I
assume may not be the most exciting thing you've done this week.
If I can find something useful to test in the lab here by the end of the
day, I think I can still get a dedicated time on the 2592 node ARM
cluster to try performance runs this weekend. So we have a ray of hope
here. That machine goes in to a mode where I can't even request
dedicated time after this month; this may be practically my last chance.
I was thinking a work around -- not sure if I have the talent to do it -
might be to return an empty list instead of an error to get me through
this weekend and then try to dig in to why this behavior is differently
with x86_64 vs aarch64, which is probably a linux kernel community
dicussion.
Let me see where I get with your suggestions, and research some client
side mount options relating to this. Maybe noacl-like things
client-side.
More soon!
On Wed, Aug 14, 2019 at 08:58:20AM -0400, Daniel Gryniewicz wrote:
> On 8/14/19 12:54 AM, Soumya Koduri wrote:
> >
> >
> > On 8/14/19 2:35 AM, Erik Jacobson wrote:
> > > > The only thing I can see is that it tries to open /dev/tty and
fails.
> > > > However, that's very early in the process, and it continues for
> > > > a long time
> > > > after that, so I'm not sure that is causing it to fail. It
> > > > never gets to
> > > > trying to lookup /home/erikj, it just stops at /home.
> > > >
> > > > I'm a bit stumped at this point. Maybe a packet trace of a
> > > > working run of
> > > > the same thing would be helpful for comparison?
> > >
> > > Hello - Meetings (... bleh)
> > >
> > > I re-ran the test using rhel76 - same image as before, same test as
> > > before same node as before.
> > >
> > > I nothing in the gNFS nfs log (but I didn't enable anything verbose
> > > there).
> > >
> > > It worked fine to do the "su - erikj" test as we expected with
gNFS.
> > >
> > > I have attached a capture to this email. I hope it helps.
> > >
> > > I'm starting to look through the output myself but I won't be much
use
> > > at spotting something I'm afraid.
> > >
> > > Let me know if you see something in this working case that could be a
> > > clue! Thank you !!!!
> >
> > One thing I noticed from this gNFS pkt trace is that after LOOKUP on
> > "/home", client immediately sent "GETACL" call. Maybe
archlinux needs
> > ACL support for non-root users (just a guess) and as NFS-Ganesha does
> > not support NFSACL program (it sent "program not supported error to
> > NFSACL NULL call), it gave "Operation not supported" error.
> >
> > NFS-Ganesha supports only NFSv4.x ACLs. Could you re-enable ACLs (global
> > and export level) in the ganesha config and try using NFSv4.x protocol?
> >
>
> Possible, although the successful pcap has NFSACL calls before the lookup of
> /home, so that may not be the case. But you're right that the next thing in
> the successful pcap is an NFSACL, and the ACCESS call only comes after that,
> and the ACCESS call never occurs on the failed pcap.
>
> Note that the NFSACL calls in the gnfs case always return empty ACLs, so the
> ACLs aren't being used for anything.
>
> Daniel
Erik Jacobson
Software Engineer
erik.jacobson(a)hpe.com
+1 612 851 0550 Office
Eagan, MN
hpe.com
Erik Jacobson
Software Engineer
erik.jacobson(a)hpe.com
+1 612 851 0550 Office
Eagan, MN
hpe.com