>> If they've changed it, that would be a horrible

>> waste of memory. Most sockets aren't active at the same time.

Never worked in kernel networking, but it seems to start small and then go up based on TCP autotuning?

>> When I originally designed this code 3 years ago, I'd deliberately

>> serialized the per socket transactions so this wouldn't happen.

NFS COMMITs/WRITEs take a lot more time than other commands. Same is the case with a readdir on a very large directory. Clients send multiple requests at the same time, and it makes sense to process them in parallel. I don't know if we get benefit in parallelizing the COMPOUND itself though.

>> we have a limited number of cores.

Believe it or not, few of our customers are running with 16K NB_WORKER threads! I was surprised as well. They experimented and found that value better than 1K or 4K etc. They have 200GB RAM and 50-100 cores. Also, in some cases, these are dedicated systems just for NFS.

On Fri, Sep 14, 2018 at 1:11 PM, William Allen Simpson <william.allen.simpson@gmail.com> wrote:

On 9/13/18 3:09 PM, Malahal Naineni wrote:

As I reported here https://github.com/nfs-ganesha/ntirpc/pull/137 <https://github.com/nfs-ganesha/ntirpc/pull/137>, Linux kernel reserves send memory per socket.

Well, that's not how it used to work, and that doesn't match the
documentation. If they've changed it, that would be a horrible
waste of memory. Most sockets aren't active at the same time.

But then, I've not contributed to the kernel TCP stack since 2009.
I'm currently planning on attending Bake-a-thon next week. We
should look into this!

So if you have two NFS clients; one on wifi that can only do 54Mbps/sec and the other on a cable that can do 10Gbps, your entire NFS server with a single sender thread will be waiting in writev() for the most part due to the slow client.

Again, that would only happen by running out of buffer space on the
slow connection. If you're asking for more data than the buffer
space, or queuing concurrent responses on the same socket that
exceed the buffer space, the single thread will be stalled by the
slowest connection.

When I originally designed this code 3 years ago, I'd deliberately
serialized the per socket transactions so this wouldn't happen.

But a certain someone wanted multiple requests on the same socket to be
concurrent, though that would mean some responses might finish out of
order. (I've never thought that was a good idea.)

Recently our maintainer was proposing that NFS Compound would also be
parallelized. (I've never thought that was a good idea either.)

As I've said and written over and over, we have a limited number of
cores. Running a lot of threads means more overhead, and reduces
cache coherency.