Hi Daniel,
I have checked this further and discovered the server was using an fd limit of 0.99*4096.
I noticed that the systemd service file performs two post startup tasks:
1. prlimit to set the pid nofile limit to 2076180 (this works, and is verified in /proc/{pid}/limits)
2. dbus command to tell ganesha to internally set the fd limit. This is verified when cache_inode_lru logging is set to info+:
ganesha.nfsd-9102[dbus_heartbeat] init_fds_limit :INODE LRU :INFO :Setting the system-imposed limit on FDs to 1048576.
This second post startup task doesn't seem to effect 100% of the time. For example, if I remove the sleep portion, ganesha does not apply the limit internally, though the systemd service reports it as a success. The actual task I'm talking about is this one:
ExecStartPost=-/bin/bash -c "/usr/bin/sleep 2 && /bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.init_fds_limit"
So the internal ganesha fd limit being set correctly seems to rely on that dbus command from the systemd unit. Is that correct?
Further, the sleep seems to be in there deliberately to allow ganesha to reach a certain point before sending the dbus command. Can this vary depending on circumstances (eg. slow server, many clients reconnecting, etc)? We are wondering if we need to tune the amount of sleep to ensure ganesha does the internal setting of the limit.
Is there a different/better way to do this via ganesha config rather than relying on the post startup dbus command? Or is there a better way to check the internal setting rather than looking for the log entry printed by the inode_lru logging component?
Rafael
Ganesha sets it's internal hard limit to FD_Limit_Percent of the system
hard limit. By default, this is 99%, so it should be 2076180 for you.
In the past, when issues like this have hit, there has been a leak in
the internal FD accounting that Ganesha uses to track this. There's no
known issues with this, but it's not impossible that there's a bug here.
Can you attach to the process with gdb, and print open_fd_count?
Frank:
We should probably be logging open_fd_count in those messages...
Daniel
On 7/25/19 5:59 AM, Gin Tan wrote:
> We are trying to figure out the hard limit for the FD, does nfs ganesha
> impose a limit?
>
> At the moment we are seeing these errors:
> 25/07/2019 19:27:59 : epoch 5d393d8c : nas2 :
> ganesha.nfsd-1681[cache_lru] lru_run :INODE LRU :WARN :Futility count
> exceeded. Client load is opening FDs faster than the LRU thread can
> close them.
> 25/07/2019 19:28:16 : epoch 5d393d8c : nas2 :
> ganesha.nfsd-1681[cache_lru] lru_run :INODE LRU :WARN :Futility count
> exceeded. Client load is opening FDs faster than the LRU thread can
> close them.
> 25/07/2019 19:28:34 : epoch 5d393d8c : nas2 : ganesha.nfsd-1681[svc_54]
> mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded,
> waking LRU thread.
> 25/07/2019 19:29:02 : epoch 5d393d8c : nas : ganesha.nfsd-1681[svc_64]
> mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit Exceeded,
> waking LRU thread.
>
> The system limit:
> $ cat /proc/sys/fs/nr_open
> 2097152
>
> And the limit for nfs-ganesha process:
>
> $ cat /proc/1681/limits
> Limit Soft Limit Hard Limit Units
> Max cpu time unlimited unlimited seconds
> Max file size unlimited unlimited bytes
> Max data size unlimited unlimited bytes
> Max stack size 8388608 unlimited bytes
> Max core file size 0 unlimited bytes
> Max resident set unlimited unlimited bytes
> Max processes 385977 385977
> processes
> Max open files 2097152 2097152 files
> Max locked memory 65536 65536 bytes
> Max address space unlimited unlimited bytes
> Max file locks unlimited unlimited locks
> Max pending signals 385977 385977 signals
> Max msgqueue size 819200 819200 bytes
> Max nice priority 0 0
> Max realtime priority 0 0
> Max realtime timeout unlimited unlimited us
>
> And the number of open files is
>
> # ls /proc/1681/fd | wc -w
> 12739
>
> I don't see why we are hitting the FD limit as we only have 12739 FD count.
>
> It is impacting the NFS clients right now, file creation is fine but
> can't open an existing file to write.
>
> I'm using VFS FSAL, and the software versions are:
> nfs-ganesha-2.7.5-1.el7.x86_64
> nfs-ganesha-vfs-2.7.5-1.el7.x86_64
>
> Thanks.
>
> Gin
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
--