First, just a quick thanks for all you do. I'm new to Ganesha.
I am using Gluster and Gluster NFS as an NFS-boot solution. I've done a
lot of testing with 2,500 aarch64 nodes spread across 9 x86_64 NFS servers for
booting (we get abut 10 minute boot times with NFS roots on Gluster with
Gluster NFS). We're using CTDB for a level of "HA" and it works great.
We've lost nodes and haven't even noticed until we checked
alerting/logs.
We didn't start with Ganesha because I was on a deadline and it was
crashing on us under heavy boot load (daemons would disappear over
time). I didn't have time to do proper debugging and reporting and the
customer had to move on too at that time. It wouldn't have been fair to
you guys to post at that time.
Since then, I've learned a lot more about Gluster and system tuning,
etc.
Our customer likes the boot times but their job launch times are taking
too long. Compared to kernel NFS (and full TMPFS no-network-FS) roots,
Gluster NFS is taking 2 more minutes to launch. They have a large
LD_LIBRARY_PATH and tons of libraries being loaded by all nodes at the
same time. Some gluster profiling is showing caching seems to be
working, we are guessing the issue is metadata load and related latency.
Because Gluster NFS is being depreciated anyway and because we want to
improve the job launch time, I have come back to trying Ganesha. I
noticed it has 'MDCACHE' and we wanted to try that out. Maybe we can
get our 2 minutes back.
The NFS solution is unique in that it combines a read-only NFS root with
TMPFS using overlay FS on each node. This gives the client nodes a true
"writable" feel even though they are based on read-only NFS.
Kernel NFS exports from standard ext4, and Gluster NFS exports are
working ok in this configuration for both architectures.
Ganesha is working OK for x86_64 nodes - normal bootup, no problem.
However, Ganesha for aarch64 client nodes is causing us trouble. A typical
RHEL7.6 bootup has many errors. Among services that never start is
"sshd" and "login". Sort of a bummer! :)
[FAILED] Failed to start Login Service.
over and over.
At this time, I'm using our systems in the lab - a tiny version of the
big one - to debug the above. If I can solve the above, I will be able
to test on the ARM supercomputer a week from today. It will be my last
chance to run on the system personally.
My goal of dedicated time next week, if I can resolve the above, is to
improve job launch times. If there are stability problems, I want to
know the right method to report them to you guys before we revert to
Gluster NFS if needed. To get to that point, I need to solve the above
issue.
Gluster info:
gluster-4.1.6
* My own build of it
Volume type: 3x3 (3 replicas for 3 total subvolumes, 9 total x86_64 servers)
Gluster server OS: RHEL 7.6
Server architecture: x86_64
Ganesha info:
Ganesha 2.8.2
* I did a personal build of this for testing, with gluster 4.1.6
Ganesha clients running on the Gluster Servers themselves.
NFS Client info:
RHEL 7.6 aarch64 (fails)
RHEL 7.6 x86_64 (works)
NFS Client environment:
- UEFI pxe to retrieve grub2
grub2 loads kernel and initrd with tftp
initrd loads a special "miniroot" with a vendor script
miniroot is an "expanded initrd" that has our toolchain
* I can enable a rescue mode where I shell out just before
SWITCHROOT if that helps
NFS / Overlay mount commands done by client:
mount -n -o ro,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3
172.23.255.249:/cm_shared/image/images_ro_nfs/rhel7.6
mount -t tmpfs -o mpol=interleave tmpfs /rootfs.rw
mount -t overlay overlay -o
lowerdir=/root_ro_nfs,upperdir=/rootfs.rw/upperdir,workdir=/rootfs.rw/work /a
(then later the magic to make the above work with SWITCHROOT).
- I don't think it's a squash issue - from the miniroot in rescue mode,
I can poke around the RO NFS + TMPFS overlay and see the contents of
root-only readable files.
- If I chroot from rescue mode in to the root, sshd stats by hand, so I think
the issue is related to systemd.
- Not being able to log in after switchroot makes it harder but I'll
start researching a way to get in, perhaps starting systemd by hand.
Any advice appreciated. Sorry this got so long.
PS: Ganesha configuration:
Disable_ACL = TRUE; # To enable/disable ACL
NFS_CORE_PARAM {
Clustered = TRUE;
}
EXPORT
{
# Export Id (mandatory, each EXPORT must have a unique Export_Id)
Export_Id = 10;
# Exported path (mandatory)
Path = "/cm_shared"; # assuming 'testvol' is the Gluster volume name
# Pseudo Path (required for NFS v4)
Pseudo = "/cm_shared";
# Required for access (default is None)
# Could use CLIENT blocks instead
Access_Type = RW;
# Allow root access
Squash = none;
#Squash = No_Root_Squash;
# Security flavor supported
SecType = "sys";
# Exporting FSAL
FSAL {
Name = "GLUSTER";
Hostname = "127.0.0.1"; # IP of one of the nodes in the trusted pool
Volume = "cm_shared";
}
}