On 9/11/18 4:45 PM, Kropelin, Adam wrote:
I suspect it also means a single out-to-lunch client could stall *all* i/o on the interface, which is another behavior I've been seeing recently. (Due to clients rebooting or otherwise going awol without umounting or closing the tcp connection.)This is true. Once the kernel I-O buffers are all full because a TCP
client has stopped Ack'ing them, no other connection can send over that
interface. That's just a fact of any kernel.
Thus the real problem is the client asking for megabytes of data in the
faint hope that will somehow be faster -- then crashing.
This has been a known problem for decades. So the TCPM WG developed
the TCP User Timeout option [RFC5482].
Malahal had a patch some time ago to timeout the client using another
means, without depending upon the option. Didn't that go in?
Non-blocking I/O would be the answer here, but without that...throw some more threads at it, I guess?Since V2.3 (before my time), we've been using IO vector zero-copy.
Posix allows either iov or async, but not both in the same call.
More threads won't help. It's a stall at the kernel level. In fact,
one thread per interface proved to be fastest, as that minimizes
locking conflicts and system calls (and improves CPU cache coherency).
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org