My organization uses Ganesha-nfs on a large, multi-headed storage system. This has been
performant and stable for some time, but recently multiple CentOS 6 and CentOS 7 clients
that use autofs to mount filesystems from this server have begun to exhibit long mount
delays.
The client-side debug logs appear to show that the delays result from timeouts of some of
autofs's pre-mount probes of the servers. Specifically, requests logged as
automount[17485]: get_nfs_info: called with host
nfs.my.org(10.220.8.68) proto 6 version 0x20
(for example) have responses logged the same second, but requests logged as
automount[17485]: get_nfs_info: called with host
nfs.my.org(10.220.8.68) proto 17 version 0x20
have no responses logged at all. Three seconds later, these latter type are followed by
"proto 6" probes of the next resolutions of the server hostname. The
three-second timeout would not be too bad if it were just once, but autofs probes every
server head in series, and all those timeouts add up to a lengthy delay in our case.
Client software and configuration have not changed since well before the troublesome
behavior began. The behavior change cannot be correlated to a _manual_ server
configuration change, but it is possible that a routine software update was applied in the
relevant time frame.
We're looking for an explanation of the observed behavior and, especially, a solution.
Does Ganesha have a configuration parameter that bears on this issue? Has Ganesha's
behavior changed in a way that would explain this?