Looking to see if there is any config that needs to be set to prevent this issue or if any logs needs to be provided for further debug? Any assistance is appreciated.
To kind of help paint a picture on our implementation, we have gone ahead and created the following miro board:
https://miro.com/app/board/o9J_lE0WOvs=/
Is that Gluster GeoRep between GlusterA/NetworkA and GlusterB/NetworkB?
At the risk of stating the obvious, GeoRep is an asynchronous replication. Even over a very low latency connection there are no guarantees about when data on one cluster will land on the replica. They will land eventually though.
We go ahead and make a large change to the NFS filesystem from one of the NFS clients by deleting and adding 8,000+ files - no particular file size.
During this bulk file update, we see missing or corrupt file appear on one of the NFS clients. The problem file(s) differ on each given time these operations happen. All other NFS clients show the file correctly.
We have been able to consistently replicate this issue over NFS 3 as well as NFS 4.2.
Only thing that seems to get logged when enabling rpcdebug for nfs seems to have the below pattern for the problem file(s) where the permissions seem to use an int of '1' instead of the ID associated with the given file like all the other files that successfully go through this process. (We are not sure if this log is relevant but this seems to be displayed when we see the error)
NFS: nfs_update_inode(0:104/11627611431409627202 fh_crc=0x4d950748 ct=1 info=0x27e7f)
NFS: nfs_lookup_revalidate_done(<MISSING_FILE_HERE>) is valid
NFS: dentry_delete(<MISSING_FILE_HERE>, 48084c)
NFS: permission(0:104/1), mask=0x1, res=0
Note: Gluster configuration has up-call enabled.
Gluster version: glusterfs 8.2
Ganesha version: NFS-Ganesha Release = V3.3
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org