Hello,
In our environment of Ceph Cluster(version 15.2.7) we are trying to use NFS HA
Mode.Facing
certain issues in the same as below:
"Active/Passive HA NFS Cluster"
When we are using Active/Passive HA Config for NFS Server using Corosync/Pacemekar:
1. configuration is done and we are able to perform fail-over, but when an
active
node is tested with power-off two scenarios are observed:
1.1 : I/O operations gets stuck until the node is powered on although the
handover from active to other standby node happens immediately once the node is
powered-off. All the existing requests are stuck.
1.2 : from other client if we try to check for the heartbeat of
mount-point,
it is also stuck for the same duration.
1.3 from the new client creating a new mount to the same subvolume works
fine.
Issues/Concern:
I/O operations should resume just after the Failover happens.We are not able to achive
this state, Can anyone please help in supporting any known
configuration/solution/work-around that can be done done at NFS-Ganesha level to achieve
healthy NFS HA Mode.
Just a Note:
mount points using Ceph's native FS driver works fine in the same shutdown/poweroff
scenarios.
Ceph version: 15.2.7
NFS Ganesha : 3.3
Ganesha Conf:
- NFS Node 1:
[ansible@cephnode2 ~]$ cat /etc/ganesha/ganesha.conf
# Please do not change this file directly since it is managed by Ansible and will be
overwritten
NFS_Core_Param
{
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 3,4;
}
EXPORT_DEFAULTS {
Attr_Expiration_Time = 0;
}
CACHEINODE {
Dir_Chunk = 0;
NParts = 1;
Cache_Size = 1;
}
RADOS_URLS {
ceph_conf = '/etc/ceph/ceph.conf';
userid = "admin";
watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode2";
}
NFSv4 {
RecoveryBackend = 'rados_cluster'; #active/active
RecoveryBackend = 'rados_ng'; #active/passive
> Lease_Lifetime = 10;
> Grace_Period = 20;
>
> }
>
>
> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "admin";
> pool = "nfs_ganesha";
> namespace = "ganesha-grace";
> nodeid = "cephnode2";
> }
>
> %url rados://nfs_ganesha/ganesha-export/conf-cephnode2
>
> LOG {
> Facility {
> name = FILE;
> destination = "/var/log/ganesha/ganesha.log";
> enable = active;
> }
>
>
> }
> EXPORT
> {
> Export_id=20235;
> Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
> Pseudo = /conf;
> Access_Type = RW;
> Protocols = 3,4;
> Transports = TCP;
> SecType = sys,krb5,krb5i,krb5p;
> Squash = No_Root_Squash;
> Attr_Expiration_Time = 0;
> FSAL {
> Name = CEPH;
> User_Id = "admin";
> }
> }
> EXPORT
> {
> Export_id=20236;
> Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
> Pseudo = /opr;
> Access_Type = RW;
> Protocols = 3,4;
> Transports = TCP;
> SecType = sys,krb5,krb5i,krb5p;
> Squash = No_Root_Squash;
> Attr_Expiration_Time = 0;
> FSAL {
> Name = CEPH;
> User_Id = "admin";
> }
> }
>
>
> # NFS Node 2:
> [ansible@cephnode3 ~]$ cat /etc/ganesha/ganesha.conf
> # Please do not change this file directly since it is managed by Ansible and will be
> overwritten
>
>
> NFS_Core_Param
> {
> Enable_NLM = false;
> Enable_RQUOTA = false;
> Protocols = 3,4;
> }
>
> EXPORT_DEFAULTS {
> Attr_Expiration_Time = 0;
> }
>
> CACHEINODE {
> Dir_Chunk = 0;
> NParts = 1;
> Cache_Size = 1;
> }
>
> RADOS_URLS {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "admin";
> watch_url = "rados://nfs_ganesha/ganesha-export/conf-cephnode3";
> }
> NFSv4 {
> RecoveryBackend = 'rados_cluster'; # for active/active
> RecoveryBackend = 'rados_ng'; #active/passive
> Lease_Lifetime = 10;
> Grace_Period = 20;
>
> }
>
>
> RADOS_KV {
> ceph_conf = '/etc/ceph/ceph.conf';
> userid = "admin";
> pool = "nfs_ganesha";
> namespace = "ganesha-grace";
> nodeid = "cephnode3";
> }
>
> %url rados://nfs_ganesha/ganesha-export/conf-cephnode3
>
> LOG {
> Facility {
> name = FILE;
> destination = "/var/log/ganesha/ganesha.log";
> enable = active;
> }
>
>
> }
> EXPORT
> {
> Export_id=20235;
> Path = "/volumes/hns/conf/bb21b7c7-c663-40e9-ad11-a61441e6f77f";
> Pseudo = /conf;
> Access_Type = RW;
> Protocols = 3,4;
> Transports = TCP;
> SecType = sys,krb5,krb5i,krb5p;
> Squash = No_Root_Squash;
> Attr_Expiration_Time = 0;
> FSAL {
> Name = CEPH;
> User_Id = "admin";
> }
> }
> EXPORT
> {
> Export_id=20236;
> Path = "/volumes/hns/opr/138304ca-a70d-4962-9754-b572bce196b6";
> Pseudo = /opr;
> Access_Type = RW;
> Protocols = 3,4;
> Transports = TCP;
> SecType = sys,krb5,krb5i,krb5p;
> Squash = No_Root_Squash;
> Attr_Expiration_Time = 0;
> FSAL {
> Name = CEPH;
> User_Id = "admin";
> }
> }
>
> ## Mount Commands at client side:
> sudo mount -t nfs -o nfsvers=4.1,proto=tcp 10.0.4.14:/conf /mnt/nfsconf
> where 10.0.4.14 is the floating IP.