[NFS-Ganesha-Support] NFS Ganesha Active/Passive HA Failover Issue

Wednesday, 14 April 2021

Hello,
In our environment of Ceph Cluster(version 15.2.7) we are trying to use NFS HA Mode.Facing
certain issues in the same as below:
"Active/Passive HA NFS Cluster"
When we are using Active/Passive HA Config for NFS Server using Corosync/Pacemekar:
        1. configuration is done and we are able to perform fail-over, but when an active
node is tested with power-off two scenarios are observed:
             1.1 : I/O operations gets stuck until the node is powered on although the
handover from active to other standby node happens immediately once the node is
powered-off. All the existing requests are stuck. 
             1.2 : from other client if we try to check for the heartbeat of mount-point,
it is also stuck for the same duration.
             1.3 from the new client creating a new mount to the same subvolume works
fine.

Issues/Concern:
I/O operations should resume just after the Failover happens.We are not able to achive
this state, Can anyone please help in supporting any known
configuration/solution/work-around that can be done done at NFS-Ganesha level to achieve
healthy NFS HA Mode. 

Just  a Note:
 mount points using Ceph's native FS driver works fine in the same shutdown/poweroff
scenarios. 

Ceph version: 15.2.7
NFS Ganesha : 3.3
Ganesha Conf:
- NFS Node 1:
     [ansible@cephnode2 ~]$ cat /etc/ganesha/ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be
overwritten

NFS_Core_Param
{
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 3,4;
}

EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

CACHEINODE {
        Dir_Chunk = 0;
        NParts = 1;
        Cache_Size = 1;
}

RADOS_URLS {
   ceph_conf = '/etc/ceph/ceph.conf';
   userid = "admin";
   watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode2";
}
NFSv4 {
        RecoveryBackend = 'rados_cluster';
        Lease_Lifetime = 10;
        Grace_Period = 20;

}

RADOS_KV {
        ceph_conf = '/etc/ceph/ceph.conf';
        userid = "admin";
        pool = "nfs_ganesha";
        namespace = "ganesha-grace";
        nodeid = "cephnode2";
}

%url rados://nfs_ganesha/ganesha-export/conf-cephnode2

LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }

}
EXPORT
{
        Export_id=20235;
        Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
        Pseudo = /conf;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
EXPORT
{
        Export_id=20236;
        Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
        Pseudo = /opr;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}

# NFS Node 2:
[ansible@cephnode3 ~]$ cat /etc/ganesha/ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be
overwritten

NFS_Core_Param
{
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 3,4;
}

EXPORT_DEFAULTS {
        Attr_Expiration_Time = 0;
}

CACHEINODE {
        Dir_Chunk = 0;
        NParts = 1;
        Cache_Size = 1;
}

RADOS_URLS {
   ceph_conf = '/etc/ceph/ceph.conf';
   userid = "admin";
   watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode3";
}
NFSv4 {
        RecoveryBackend = 'rados_cluster';
        Lease_Lifetime = 10;
        Grace_Period = 20;

}

RADOS_KV {
        ceph_conf = '/etc/ceph/ceph.conf';
        userid = "admin";
        pool = "nfs_ganesha";
        namespace = "ganesha-grace";
        nodeid = "cephnode3";
}

%url rados://nfs_ganesha/ganesha-export/conf-cephnode3

LOG {
        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha.log";
                enable = active;
        }

}
EXPORT
{
        Export_id=20235;
        Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
        Pseudo = /conf;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}
EXPORT
{
        Export_id=20236;
        Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
        Pseudo = /opr;
        Access_Type = RW;
        Protocols = 3,4;
        Transports = TCP;
        SecType = sys,krb5,krb5i,krb5p;
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
        FSAL {
                Name = CEPH;
                User_Id = "admin";
        }
}

## Mount Commands at client side:
sudo mount -t nfs -o nfsvers=4.1,proto=tcp 10.0.4.14:/conf /mnt/nfsconf
where 10.0.4.14 is the floating IP.

2025

2024

2023

2022

2021

2020

2019

2018

[NFS-Ganesha-Support] NFS Ganesha Active/Passive HA Failover Issue