As I reported here https://github.com/nfs-ganesha/ntirpc/pull/137, Linux kernel reserves send memory per socket. So if you have two NFS clients; one on wifi that can only do 54Mbps/sec and the other on a cable that can do 10Gbps, your entire NFS server with a single sender thread will be waiting in writev() for the most part due to the slow client.

This issue is very easy to reproduce and my xp_ifindex patch (not the timeout one) helped in the following case:

We have an NFS server that can do 40Gbps, but the connected NFS clients can only do 10Gbps. The plain ganesha2.5 code did max out one client connection with reads (throughput was close to 10Gbps) when one client was active. If we used 2 clients, the total throughput was still 10Gbps. After the fix, each client got close to 10Gbps.

Regards, Malahal.

On Wed, Sep 12, 2018 at 5:06 PM, William Allen Simpson <william.allen.simpson@gmail.com> wrote:

[Going back to your Original Post]

On 9/11/18 12:58 PM, Kropelin, Adam wrote:

* By observation on the wire I see that the Linux NFS client is submitting 16 or more 1 MB READ RPCs at once. If I prevent that behavior by adding 'iflag=direct' to the dd command, suddenly scalability is back where it should be. Something about having a lot of read i/o in flight seems to matter here.

Remember, an interface handles one packet at a time. 16 parallel read
requests will give you improvements in the case the FSAL is storage I-O
bound. But you also indicate they are all the same data, so it should be
reading from cache. So you're going to be network I-O bound, and 16
parallel requests won't help.

* I grabbed several core dumps of ganesha during a period where 8 clients were hitting it. Every single thread is idle (typically pthread_cond_wait'ing for some work) except for one rpc worker which is in writev. This is true repeatedly throughout the test. It is as if somehow a single rpc worker thread is doing all of the network i/o to every client.

Since V2.6, all the input RPC paths are async. This shows that all
the data has been fetched, and you're waiting for the data to be
sent out over one interface. It only takes one worker to do that.
In fact, one worker is optimal.

_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org