On 8/14/19 8:38 PM, Erik Jacobson wrote:
I pasted some info below in case it's interesting of tests I did
so far.
My nfsv4 question is as follows.
I chroot in like before with the simple test case.
(if noacl is on the client side, there is a long pause, otherwise
fast)
I get:
To start with could you please again collect pkt trace (of both nfs and
gluster traffic) and check for errors in the logs (ganesha.log,
ganesha-gfapi.log, brick logs)
Thanks,
Soumya
>
> sh-4.2# su - erikj
> su: cannot set groups: Operation not permitted
>
> Which made curious, so I did this:
>
> sh-4.2# ls -l /etc/securetty
> -rw------- 1 4294967294 4294967294 221 Jun 21 2018 /etc/securetty
>
> Then outside the chroot, I checked too:
> bash-4.2# ls -l /a/etc/securetty
> -rw------- 1 4294967294 4294967294 221 Jun 21 2018 /a/etc/securetty
>
>
> Here is what it is on the nfs server side:
> # ls -l /opt/clmgr/image/images_ro_nfs/rhel76-aarch64-newkernel/etc/securetty
> -rw------- 1 root root 221 Jun 21 2018
/opt/clmgr/image/images_ro_nfs/rhel76-aarch64-newkernel/etc/securetty
>
>
> I'm not sure exactly what Manage_Gids does but it made it so the
> su resulted in permission denied for libraries.
>
> Not sure exactly what it does, but I tried (in NFS_CORE_PARAM):
> Only_Numeric_Owners = TRUE
>
> I also tried, for fun:
> Allow_Numeric_Owners = FALSE;
>
> And it didn't change in either case.
>
> Squash is 'none'.
>
> I didn't set any anonymous uid stuff.
>
>
> So my NFS v4 experiment is having some issues.
>
> I'm sorry to come off so needy here. Thanks again for the help so far.
>
>
> Here is the current test case from miniroot (fat initrd):
> umount /a
> umount /root_ro_nfs
> umount /rootfs.rw
> #####mount -o ro,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel /root_ro_nfs
> mount -o ro,nolock
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel76-aarch64-newkernel /root_ro_nfs
> mount -t tmpfs -o mpol=interleave tmpfs /rootfs.rw
> mkdir /rootfs.rw/upperdir
> mkdir /rootfs.rw/work
> mount -t overlay overlay -o
lowerdir=/root_ro_nfs,upperdir=/rootfs.rw/upperdir,workdir=/rootfs.rw/work /a
>
>
>
> Some other cases tried:
>
> - client side vers=3 "noacl" mount option doesn't change the behavior.
> - no "noacl" is availble for tmpfs or overlay.
> - nfs v4.1 on the client - noacl in mount comand - leaving ACLs disabled on
> the server, does see a behavior change but it still fails:
> sh-4.2# su - erikj
> su: cannot set groups: Operation not permitted
> - Same as above, but removing 'noacl' from nfsv4 mount line -- same
> behavior.
> - nfsv41 client, ganesha ACLs enabled (export and global)
> * su - erikj takes a long long time
> * then fails with
> su: cannot set groups: Operation not permitted
> - like above, but adding 'noacl' to client mode - the long pause doens't
> happen but still ends up with:
> (on another trial the long pause happened)
> su: cannot set groups: Operation not permitted
>
>
> On Wed, Aug 14, 2019 at 08:52:57AM -0500, Erik Jacobson wrote:
>> This gives me several things to try! I will report back.
>>
>> THANK YOU SO MUCH for taking the time to look at packet traces, which I
>> assume may not be the most exciting thing you've done this week.
>>
>> If I can find something useful to test in the lab here by the end of the
>> day, I think I can still get a dedicated time on the 2592 node ARM
>> cluster to try performance runs this weekend. So we have a ray of hope
>> here. That machine goes in to a mode where I can't even request
>> dedicated time after this month; this may be practically my last chance.
>>
>> I was thinking a work around -- not sure if I have the talent to do it -
>> might be to return an empty list instead of an error to get me through
>> this weekend and then try to dig in to why this behavior is differently
>> with x86_64 vs aarch64, which is probably a linux kernel community
>> dicussion.
>>
>> Let me see where I get with your suggestions, and research some client
>> side mount options relating to this. Maybe noacl-like things
>> client-side.
>>
>> More soon!
>>
>> On Wed, Aug 14, 2019 at 08:58:20AM -0400, Daniel Gryniewicz wrote:
>>> On 8/14/19 12:54 AM, Soumya Koduri wrote:
>>>>
>>>>
>>>> On 8/14/19 2:35 AM, Erik Jacobson wrote:
>>>>>> The only thing I can see is that it tries to open /dev/tty and
fails.
>>>>>> However, that's very early in the process, and it continues
for
>>>>>> a long time
>>>>>> after that, so I'm not sure that is causing it to fail. It
>>>>>> never gets to
>>>>>> trying to lookup /home/erikj, it just stops at /home.
>>>>>>
>>>>>> I'm a bit stumped at this point. Maybe a packet trace of a
>>>>>> working run of
>>>>>> the same thing would be helpful for comparison?
>>>>>
>>>>> Hello - Meetings (... bleh)
>>>>>
>>>>> I re-ran the test using rhel76 - same image as before, same test as
>>>>> before same node as before.
>>>>>
>>>>> I nothing in the gNFS nfs log (but I didn't enable anything
verbose
>>>>> there).
>>>>>
>>>>> It worked fine to do the "su - erikj" test as we expected
with gNFS.
>>>>>
>>>>> I have attached a capture to this email. I hope it helps.
>>>>>
>>>>> I'm starting to look through the output myself but I won't be
much use
>>>>> at spotting something I'm afraid.
>>>>>
>>>>> Let me know if you see something in this working case that could be
a
>>>>> clue! Thank you !!!!
>>>>
>>>> One thing I noticed from this gNFS pkt trace is that after LOOKUP on
>>>> "/home", client immediately sent "GETACL" call. Maybe
archlinux needs
>>>> ACL support for non-root users (just a guess) and as NFS-Ganesha does
>>>> not support NFSACL program (it sent "program not supported error to
>>>> NFSACL NULL call), it gave "Operation not supported" error.
>>>>
>>>> NFS-Ganesha supports only NFSv4.x ACLs. Could you re-enable ACLs (global
>>>> and export level) in the ganesha config and try using NFSv4.x protocol?
>>>>
>>>
>>> Possible, although the successful pcap has NFSACL calls before the lookup of
>>> /home, so that may not be the case. But you're right that the next thing
in
>>> the successful pcap is an NFSACL, and the ACCESS call only comes after that,
>>> and the ACCESS call never occurs on the failed pcap.
>>>
>>> Note that the NFSACL calls in the gnfs case always return empty ACLs, so the
>>> ACLs aren't being used for anything.
>>>
>>> Daniel
>>
>>
>> Erik Jacobson
>> Software Engineer
>>
>> erik.jacobson(a)hpe.com
>> +1 612 851 0550 Office
>>
>> Eagan, MN
>>
hpe.com
>
>
> Erik Jacobson
> Software Engineer
>
> erik.jacobson(a)hpe.com
> +1 612 851 0550 Office
>
> Eagan, MN
>
hpe.com
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>