On Fri, Apr 16, 2021 at 6:05 AM <lokendrarathour@gmail.com> wrote:

As checked we do not see any issue in the ARP Forwarding.
what other reason u feel for this 5 min to resume I/O Stuck situation ?
is it some constant in the nfs-ganesha which is by force holding the MDS/Request for constant period(5 min) before it could transfer it to other node ?

There isn't anything that holds requests.

to achieve HA, the failover-handover time should be very less.

Indeed. And that's exactly what we see in the glusterfs+ganesha HA solution — it's instantaneous.

The next thing is to collect wireshark/tcpdumps on each of the servers and on the client that is experiencing the five minute delay. Start the tcpdumps, then trigger a failover and let the tcpdumps run for a few minutes before stopping them

Check for every NFS request sent by the client and make sure it arrives on the correct server. And that every server reply that is sent is received on the client.

Post the tcpdump files somewhere and one of us will try to make some time to look at them if you don't find anything.

Kaleb