Looking to see if there is any config that needs to be set to prevent this issue or if any
logs needs to be provided for further debug? Any assistance is appreciated.
To kind of help paint a picture on our implementation, we have gone ahead and created the
following miro board:
https://miro.com/app/board/o9J_lE0WOvs=/
We go ahead and make a large change to the NFS filesystem from one of the NFS clients by
deleting and adding 8,000+ files - no particular file size.
During this bulk file update, we see missing or corrupt file appear on one of the NFS
clients. The problem file(s) differ on each given time these operations happen. All other
NFS clients show the file correctly.
We have been able to consistently replicate this issue over NFS 3 as well as NFS 4.2.
Only thing that seems to get logged when enabling rpcdebug for nfs seems to have the below
pattern for the problem file(s) where the permissions seem to use an int of '1'
instead of the ID associated with the given file like all the other files that
successfully go through this process. (We are not sure if this log is relevant but this
seems to be displayed when we see the error)
NFS: nfs_update_inode(0:104/11627611431409627202 fh_crc=0x4d950748 ct=1 info=0x27e7f)
NFS: nfs_lookup_revalidate_done(<MISSING_FILE_HERE>) is valid
NFS: dentry_delete(<MISSING_FILE_HERE>, 48084c)
NFS: permission(0:104/1), mask=0x1, res=0
Note: Gluster configuration has up-call enabled.
Gluster version: glusterfs 8.2
Ganesha version: NFS-Ganesha Release = V3.3