On Fri, Apr 16, 2021 at 6:05 AM <lokendrarathour(a)gmail.com> wrote:
As checked we do not see any issue in the ARP Forwarding.
what other reason u feel for this 5 min to resume I/O Stuck situation ?
is it some constant in the nfs-ganesha which is by force holding the
MDS/Request for constant period(5 min) before it could transfer it to other
node ?
There isn't anything that holds requests.
to achieve HA, the failover-handover time should be very less.
Indeed. And that's exactly what we see in the glusterfs+ganesha HA solution
— it's instantaneous.
The next thing is to collect wireshark/tcpdumps on each of the servers and
on the client that is experiencing the five minute delay. Start the
tcpdumps, then trigger a failover and let the tcpdumps run for a few
minutes before stopping them
Check for every NFS request sent by the client and make sure it arrives on
the correct server. And that every server reply that is sent is received on
the client.
Post the tcpdump files somewhere and one of us will try to make some time
to look at them if you don't find anything.
--
Kaleb