[Going back to your Original Post]
On 9/11/18 12:58 PM, Kropelin, Adam wrote:
* By observation on the wire I see that the Linux NFS client is
submitting 16 or more 1 MB READ RPCs at once. If I prevent that behavior by adding
'iflag=direct' to the dd command, suddenly scalability is back where it should be.
Something about having a lot of read i/o in flight seems to matter here.
Remember, an interface handles one packet at a time. 16 parallel read
requests will give you improvements in the case the FSAL is storage I-O
bound. But you also indicate they are all the same data, so it should be
reading from cache. So you're going to be network I-O bound, and 16
parallel requests won't help.
* I grabbed several core dumps of ganesha during a period where 8
clients were hitting it. Every single thread is idle (typically pthread_cond_wait'ing
for some work) except for one rpc worker which is in writev. This is true repeatedly
throughout the test. It is as if somehow a single rpc worker thread is doing all of the
network i/o to every client.
Since V2.6, all the input RPC paths are async. This shows that all
the data has been fetched, and you're waiting for the data to be
sent out over one interface. It only takes one worker to do that.
In fact, one worker is optimal.