We definitely need to fix this. It's on the list.
Daniel
On 09/13/2018 03:52 PM, Imam Toufique wrote:
> We are about to start using Ganesha , just starting to put things
> together. If this is right, shouldn't there be a fix or an ( if any )
> workaround? :-)
>
> thanks.
>
> On Thu, Sep 13, 2018 at 12:21 PM Malahal Naineni <malahal@gmail.com
> <mailto:malahal@gmail.com>> wrote:
>
> >> Once the kernel I-O buffers are all full because a TCP
> client has stopped Ack'ing them, no other connection can send over that
> interface. That's just a fact of any kernel.
>
> I would say it is a poor design if that is really true, and I am
> pretty sure Linux doesn't have that issue based on what we saw in
> the field. Ganesha was stalled in its writev() call but all other
> networking stuff was working just fine.
> >> More threads won't help. It's a stall at the kernel level.
>
> We have a proof that more threads helped!
>
> On Wed, Sep 12, 2018 at 4:42 PM, William Allen Simpson
> <william.allen.simpson@gmail.com
> <mailto:william.allen.simpson@gmail.com>> wrote:
>
> On 9/11/18 4:45 PM, Kropelin, Adam wrote:
>
> I suspect it also means a single out-to-lunch client could
> stall *all* i/o on the interface, which is another behavior
> I've been seeing recently. (Due to clients rebooting or
> otherwise going awol without umounting or closing the tcp
> connection.)
>
> This is true. Once the kernel I-O buffers are all full because
> a TCP
> client has stopped Ack'ing them, no other connection can send
> over that
> interface. That's just a fact of any kernel.
>
> Thus the real problem is the client asking for megabytes of data
> in the
> faint hope that will somehow be faster -- then crashing.
>
> This has been a known problem for decades. So the TCPM WG developed
> the TCP User Timeout option [RFC5482].
>
> Malahal had a patch some time ago to timeout the client using
> another
> means, without depending upon the option. Didn't that go in?
>
>
> Non-blocking I/O would be the answer here, but without
> that...throw some more threads at it, I guess?
>
> Since V2.3 (before my time), we've been using IO vector zero-copy.
> Posix allows either iov or async, but not both in the same call.
>
> More threads won't help. It's a stall at the kernel level. In
> fact,
> one thread per interface proved to be fastest, as that minimizes
> locking conflicts and system calls (and improves CPU cache
> coherency).
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> <mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to
> devel-leave@lists.nfs-ganesha.org
> <mailto:devel-leave@lists.nfs-ganesha.org>
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> <mailto:devel@lists.nfs-ganesha.org>
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
> <mailto:devel-leave@lists.nfs-ganesha.org>
>
>
>
> --
> Regards,
> */Imam Toufique/*
> /*213-700-5485*/
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org