New subject: NFSv3 active-passive cluster with FSAL CephFS and keepalived - Stale file handle after restart and also failover

Friday, 22 January 2021

Hi,

I configured five unrestriced LXC containers on Proxmox and also set lxc.seccomp.profile
to empty.

This is my configuration:
root@nfsshares-a:~# egrep -v '^[[:blank:]]*#|^[[:blank:]]*$'
/etc/ganesha/ganesha.conf
NFS_CORE_PARAM
{
        Enable_NLM = false;
        Enable_RQUOTA = false;
        Protocols = 3,4;
        mount_path_pseudo = true;
}
NFS_KRB5
{
        Active_krb5 = false;
}
NFSv4
{
        RecoveryBackend = rados_ng;
        Minor_Versions =  1,2;
}
MDCACHE {
        Dir_Chunk = 0;
}
EXPORT_DEFAULTS {
        SecType = "sys";
        Squash = No_Root_Squash;
        Attr_Expiration_Time = 0;
}
CEPH
{
}
RADOS_KV
{
        UserId = "ganesharecov";
}
RADOS_URLS
{
        UserId = "ganeshaurls";
        watch_url = "rados://nfs-ganesha/ganesha-namespace/conf-nfsshares";
}
%url    rados://nfs-ganesha/ganesha-namespace/conf-nfsshares
root@nfsshares-a:~#

And this is an example export. The other exports just have different clients for the other
paths.
root@proxmox07:~# cat /tmp/export-1
EXPORT {
    FSAL {
        secret_access_key = "AQDm7eFfJWvhIBAAuI++d0SNkdylNkMx5t/N8w==";
        user_id = "ganesha";
        name = "CEPH";
        filesystem = "cephfs";
    }

    CLIENT {
        clients = 10.20.56.254;
        squash = "no_root_squash";
        access_type = "RW";
    }

    pseudo = "/";
    squash = "no_root_squash";
    access_type = "NONE";
    path = "/";
    export_id = 1;
    transports = "TCP";
    protocols = 3, 4;
}

Mounting filesystems works fine, but as soon as I restart the nfs-ganesha daemon I get a
stale file handle on the client.

Looking at 
root@nfsshares-a:~# ganesha-rados-grace --userid ganesharecov --pool nfs-ganesha --ns
namespace-ganesha dump
cur=1 rec=0
======================================================
nfsshares-a      E
nfsshares-b      E
nfsshares-c      E
nfsshares-d      E
nfsshares-e      E
root@nfsshares-a:~#

The nodes never loose the E state.

These are the permissions on ceph:
root@proxmox07:~# cat /etc/pve/priv/ceph.client.ganesha*.keyring
[client.ganesha]
        key = <removed>
        caps mds = "allow r path=/, allow rw path=/vol"
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rw
pool=cephfs_data, allow rw pool=nfs-ganesha"
[client.ganesharecov]
        key = <removed>
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rw
pool=nfs-ganesha"
[client.ganeshaurls]
        key = <removed>
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rw
pool=nfs-ganesha"
root@proxmox07:~#

Which service account is used to remove the E state on ceph? Do I have a permission
problem?

Best regards
Rainer

Log from a restart:
----
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-239[main] main :MAIN
:EVENT :ganesha.nfsd Starting: Ganesha Version 3.4
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] rados_ng_init
:CLIENT ID :EVENT :Rados kv store init done
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
21/01/2021 10:48:08 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] lower_my_caps
:NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management
in FSAL
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] lower_my_caps
:NFS STARTUP :EVENT :currenty set capabilities are: =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed
(-1765328160:22)
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main]
nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] nfs_start :NFS
STARTUP :EVENT :-------------------------------------------------
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] nfs_start :NFS
STARTUP :EVENT :             NFS SERVER INITIALIZED
21/01/2021 10:48:22 : epoch 60095be8 : nfsshares-a : ganesha.nfsd-246[main] nfs_start :NFS
STARTUP :EVENT :-------------------------------------------------
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-236[main] main :MAIN
:EVENT :ganesha.nfsd Starting: Ganesha Version 3.4
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] rados_ng_init
:CLIENT ID :EVENT :Rados kv store init done
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
22/01/2021 19:47:22 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] lower_my_caps
:NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management
in FSAL
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] lower_my_caps
:NFS STARTUP :EVENT :currenty set capabilities are: =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed
(-1765328160:22)
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main]
nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] nfs_start :NFS
STARTUP :EVENT :-------------------------------------------------
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] nfs_start :NFS
STARTUP :EVENT :             NFS SERVER INITIALIZED
22/01/2021 19:47:49 : epoch 600b2bca : nfsshares-a : ganesha.nfsd-249[main] nfs_start :NFS
STARTUP :EVENT :-------------------------------------------------
root@nfsshares-a:~#

2025

2024

2023

2022

2021

2020

2019

2018

NFSv3 active-passive cluster with FSAL CephFS and keepalived - Stale file handle after restart and also failover