NFS Ganesha Active/Passive HA Failover Issue
                                
                                
                                
                                    
                                        by lokendrarathour@gmail.com
                                    
                                
                                
                                        Hello,
In our environment of Ceph Cluster(version 15.2.7) we are trying to use NFS HA Mode.Facing certain issues in the same as below:
"Active/Passive HA NFS Cluster"
When we are using Active/Passive HA Config for NFS Server using Corosync/Pacemekar:
        1. configuration is done and we are able to perform fail-over, but when an active node is tested with power-off two scenarios are observed:
             1.1 : I/O operations gets stuck until the node is powered on although the handover from active to other standby node happens immediately once the node is powered-off. All the existing requests are stuck. 
             1.2 : from other client if we try to check for the heartbeat of mount-point, it is also stuck for the same duration.
             1.3 from the new client creating a new mount to the same subvolume works fine.
Issues/Concern:
I/O operations should resume just after the Failover happens.We are not able to achive this state, Can anyone please help in supporting any known configuration/solution/work-around that can be done done at NFS-Ganesha level to achieve healthy NFS HA Mode. 
Just  a Note:
 mount points using Ceph's native FS driver works fine in the same shutdown/poweroff scenarios. 
Ceph version: 15.2.7
NFS Ganesha : 3.3
Ganesha Conf:
- NFS Node 1:
     [ansible@cephnode2 ~]$ cat /etc/ganesha/ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be overwritten
NFS_Core_Param
{
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 3,4;
}
EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}
CACHEINODE {
        Dir_Chunk = 0;
        NParts = 1;
        Cache_Size = 1;
}
RADOS_URLS {
   ceph_conf = '/etc/ceph/ceph.conf';
   userid = "admin";
   watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode2";
}
NFSv4 {
        RecoveryBackend = 'rados_cluster';
        Lease_Lifetime = 10;
        Grace_Period = 20;
}
RADOS_KV {
        ceph_conf = '/etc/ceph/ceph.conf';
        userid = "admin";
        pool = "nfs_ganesha";
        namespace = "ganesha-grace";
        nodeid = "cephnode2";
}
%url rados://nfs_ganesha/ganesha-export/conf-cephnode2
LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }
}
EXPORT
{
        Export_id=20235;
        Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
        Pseudo = /conf;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
EXPORT
{
        Export_id=20236;
        Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
        Pseudo = /opr;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
# NFS Node 2:
[ansible@cephnode3 ~]$ cat /etc/ganesha/ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be overwritten
NFS_Core_Param
{
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 3,4;
}
EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}
CACHEINODE {
        Dir_Chunk = 0;
        NParts = 1;
        Cache_Size = 1;
}
RADOS_URLS {
   ceph_conf = '/etc/ceph/ceph.conf';
   userid = "admin";
   watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode3";
}
NFSv4 {
        RecoveryBackend = 'rados_cluster';
        Lease_Lifetime = 10;
        Grace_Period = 20;
}
RADOS_KV {
        ceph_conf = '/etc/ceph/ceph.conf';
        userid = "admin";
        pool = "nfs_ganesha";
        namespace = "ganesha-grace";
        nodeid = "cephnode3";
}
%url rados://nfs_ganesha/ganesha-export/conf-cephnode3
LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }
}
EXPORT
{
        Export_id=20235;
        Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
        Pseudo = /conf;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
EXPORT
{
        Export_id=20236;
        Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
        Pseudo = /opr;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
## Mount Commands at client side:
sudo mount -t nfs -o nfsvers=4.1,proto=tcp 10.0.4.14:/conf /mnt/nfsconf
where 10.0.4.14 is the floating IP.