Hi Daniel,
I have checked this further and discovered the server was using an fd limit
of 0.99**4096*.
I noticed that the systemd service file performs two post startup tasks:
1. prlimit to set the pid nofile limit to 2076180 (this works, and is
verified in /proc/{pid}/limits)
2. dbus command to tell ganesha to internally set the fd limit. This is
verified when cache_inode_lru logging is set to info+:
ganesha.nfsd-9102[dbus_heartbeat] init_fds_limit :INODE LRU :INFO :Setting
the system-imposed limit on FDs to 1048576.
This second post startup task doesn't seem to effect 100% of the time. For
example, if I remove the sleep portion, ganesha does not apply the limit
internally, though the systemd service reports it as a success. The actual
task I'm talking about is this one:
ExecStartPost=-/bin/bash -c "/usr/bin/sleep 2 && /bin/dbus-send --system
--dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin
org.ganesha.nfsd.admin.init_fds_limit"
So the internal ganesha fd limit being set correctly seems to rely on that
dbus command from the systemd unit. Is that correct?
Further, the sleep seems to be in there deliberately to allow ganesha to
reach a certain point before sending the dbus command. Can this vary
depending on circumstances (eg. slow server, many clients reconnecting,
etc)? We are wondering if we need to tune the amount of sleep to ensure
ganesha does the internal setting of the limit.
Is there a different/better way to do this via ganesha config rather than
relying on the post startup dbus command? Or is there a better way to check
the internal setting rather than looking for the log entry printed by the
inode_lru logging component?
Rafael
On Thu, 25 Jul 2019 at 23:25, Daniel Gryniewicz <dang(a)redhat.com> wrote:
Ganesha sets it's internal hard limit to FD_Limit_Percent of the
system
hard limit. By default, this is 99%, so it should be 2076180 for you.
In the past, when issues like this have hit, there has been a leak in
the internal FD accounting that Ganesha uses to track this. There's no
known issues with this, but it's not impossible that there's a bug here.
Can you attach to the process with gdb, and print open_fd_count?
Frank:
We should probably be logging open_fd_count in those messages...
Daniel
On 7/25/19 5:59 AM, Gin Tan wrote:
> We are trying to figure out the hard limit for the FD, does nfs ganesha
> impose a limit?
>
> At the moment we are seeing these errors:
> 25/07/2019 19:27:59 : epoch 5d393d8c : nas2 :
> ganesha.nfsd-1681[cache_lru] lru_run :INODE LRU :WARN :Futility count
> exceeded. Client load is opening FDs faster than the LRU thread can
> close them.
> 25/07/2019 19:28:16 : epoch 5d393d8c : nas2 :
> ganesha.nfsd-1681[cache_lru] lru_run :INODE LRU :WARN :Futility count
> exceeded. Client load is opening FDs faster than the LRU thread can
> close them.
> 25/07/2019 19:28:34 : epoch 5d393d8c : nas2 : ganesha.nfsd-1681[svc_54]
> mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded,
> waking LRU thread.
> 25/07/2019 19:29:02 : epoch 5d393d8c : nas : ganesha.nfsd-1681[svc_64]
> mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded,
> waking LRU thread.
>
> The system limit:
> $ cat /proc/sys/fs/nr_open
> 2097152
>
> And the limit for nfs-ganesha process:
>
> $ cat /proc/1681/limits
> Limit Soft Limit Hard Limit Units
> Max cpu time unlimited unlimited
seconds
> Max file size unlimited unlimited bytes
> Max data size unlimited unlimited bytes
> Max stack size 8388608 unlimited bytes
> Max core file size 0 unlimited bytes
> Max resident set unlimited unlimited bytes
> Max processes 385977 385977
> processes
> Max open files 2097152 2097152 files
> Max locked memory 65536 65536 bytes
> Max address space unlimited unlimited bytes
> Max file locks unlimited unlimited locks
> Max pending signals 385977 385977
signals
> Max msgqueue size 819200 819200 bytes
> Max nice priority 0 0
> Max realtime priority 0 0
> Max realtime timeout unlimited unlimited us
>
> And the number of open files is
>
> # ls /proc/1681/fd | wc -w
> 12739
>
> I don't see why we are hitting the FD limit as we only have 12739 FD
count.
>
> It is impacting the NFS clients right now, file creation is fine but
> can't open an existing file to write.
>
> I'm using VFS FSAL, and the software versions are:
> nfs-ganesha-2.7.5-1.el7.x86_64
> nfs-ganesha-vfs-2.7.5-1.el7.x86_64
>
> Thanks.
>
> Gin
>
>
> _______________________________________________
> Devel mailing list -- devel(a)lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org
--
*Rafael Lopez*
Research Devops Engineer
Monash University eResearch Centre
T: +61 3 9905 9118
M: +61 (0)427682670 <%2B61%204%2027682%20670>
E: rafael.lopez(a)monash.edu