1) Did you test with the one line patch that I posted?
2) What client distro are you using? What kernel version?
3) What server distro are you using? What kernel version?
4) What does your mount look like?
On 9/17/18 12:40 PM, Kropelin, Adam wrote:
See attached for the changes I'm testing with.
Your patch turns an async thread system into sync per fd.
This really shouldn't be visible to the library caller, so doesn't
belong in the SVCXPRT.
You can do whatever you want, but I already added an IOQ in the
rpc_dplx_rec for receiving. That would be a better place.
And none of this fixes the underlying problem. 1GB in 1MB chunks all
processing in parallel is going to make 100,000 TCP segments, and
Linux default caps the outstanding segments at 1,000 or 10,000 per
interface (depending on the queue type). Multiply by the number of
callers. Basically you have piggy callers.
All your patch does is randomize the fd that is serviced (because
we cannot control which thread will be selected by the scheduler).
And leaves an awful lot of threads waiting.
Maybe we should go back to the many threads model and abandon
async. I've said and written repeatedly that a better use of my
time would have been the zero-copy interfaces needed for RDMA
(my original reason for participating on this project).
We'd have seen an immediate 35% improvement in throughput, instead of
this meager 5-6% over 4 years.