I am running nfs-ganesha with my custom FSAL on macOS. My FSAL implements a custom VFS layer, with its own caching.
We recently upgraded from version 4.0.6 to 5.5. We are now seeing a lot of instability and we think these are regressions in nfs-ganesha itself. Our FSAL did not change much during the version upgrade.
For example, with a debug build I see:
Assertion failed: (!entry->fh_hk.inavl), function _mdcache_lru_unref, file mdcache_lru.c, line 1971.
One thing worth noting: we really do not want MDCACHE to do any caching at all. We have this in our config:
MDCACHE {
# Disable readdir caching (does not affect client, only server)
Dir_Chunk = 0;
}
EXPORT {
...
# Disable attribute-caching (does not affect client, only server)
Attr_Expiration_Time = 0;
...
}
With TSAN:
==================
WARNING: ThreadSanitizer: data race (pid=82454)
Read of size 4 at 0x00010d701f60 by thread T52:
#0 _mdcache_lru_unref <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009ec81c) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#1 mdcache_put_ref <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009d904c) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#2 compound_data_Free <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008f60d4) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#3 nfs4_Compound <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008f5d60) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#4 nfs_rpc_process_request <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008de0c0) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#5 nfs_rpc_valid_NFS <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008deb60) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#6 svc_vc_decode <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a2d010) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#7 svc_request <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a293e4) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#8 svc_vc_recv <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a2c758) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#9 svc_rqst_xprt_task_recv <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a29288) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#10 svc_rqst_epoll_loop <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a26f40) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#11 work_pool_thread <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a306e0) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Previous write of size 4 at 0x00010d701f60 by thread T56 (mutexes: read M0, write M1):
#0 _mdcache_lru_ref <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009efcc8) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#1 mdcache_find_keyed_reason <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009e1dd8) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#2 mdcache_locate_host <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009e3040) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#3 mdcache_create_handle <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009dd818) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#4 nfs4_op_putfh <null>:164731984 (srcfsd_darwin_dev:arm64+0x10091d450) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#5 process_one_op <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008f4908) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#6 nfs4_Compound <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008f5c24) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#7 nfs_rpc_process_request <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008de0c0) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#8 nfs_rpc_valid_NFS <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008deb60) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#9 svc_vc_decode <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a2d010) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#10 svc_request <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a293e4) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#11 svc_vc_recv <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a2c758) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#12 svc_rqst_xprt_task_recv <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a29288) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#13 svc_rqst_epoll_loop <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a26f40) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#14 work_pool_thread <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a306e0) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Location is heap block of size 1736 at 0x00010d701c00 allocated by main thread:
#0 calloc <null>:174073376 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x6094c) (BuildId: 981013a59ee23029b2ed90b76951327532000000200000000100000000000b00)
#1 mdcache_lru_get <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009eede4) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#2 mdcache_new_entry <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009e0700) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#3 mdcache_lookup_path <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009dd5cc) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#4 init_export_root <null>:164725696 (srcfsd_darwin_dev:arm64+0x100999afc) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#5 init_export_cb <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009995a4) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#6 foreach_gsh_export <null>:164725696 (srcfsd_darwin_dev:arm64+0x100994544) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#7 exports_pkginit <null>:164725696 (srcfsd_darwin_dev:arm64+0x100999534) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#8 nfs_start <null>:164725696 (srcfsd_darwin_dev:arm64+0x1008cd00c) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#9 nfs_libmain2 <null>:164725696 (srcfsd_darwin_dev:arm64+0x1008cf168) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#10 devtools_vfs::NfsMain(devtools_vfs::NfsMainOptions&) <null>:164725696 (srcfsd_darwin_dev:arm64+0x100809370) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#11 main <null>:164725696 (srcfsd_darwin_dev:arm64+0x10015c9f8) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Mutex M0 (0x00010d827db8) created at:
#0 pthread_rwlock_init <null>:174073376 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31798) (BuildId: 981013a59ee23029b2ed90b76951327532000000200000000100000000000b00)
#1 cih_pkginit <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009de328) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#2 mdcache_pkginit <null>:164725696 (srcfsd_darwin_dev:arm64+0x1009f2fc0) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#3 init_server_pkgs <null>:164725696 (srcfsd_darwin_dev:arm64+0x1008cc984) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#4 nfs_libmain2 <null>:164725696 (srcfsd_darwin_dev:arm64+0x1008cef00) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#5 devtools_vfs::NfsMain(devtools_vfs::NfsMainOptions&) <null>:164725696 (srcfsd_darwin_dev:arm64+0x100809370) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#6 main <null>:164725696 (srcfsd_darwin_dev:arm64+0x10015c9f8) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Mutex M1 (0x000104bab008) created at:
#0 pthread_mutex_init <null>:174073376 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x31354) (BuildId: 981013a59ee23029b2ed90b76951327532000000200000000100000000000b00)
#1 mdcache_lru_pkginit <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009ed250) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#2 mdcache_pkginit <null>:164731984 (srcfsd_darwin_dev:arm64+0x1009f2f78) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#3 init_server_pkgs <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008cc984) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#4 nfs_libmain2 <null>:164731984 (srcfsd_darwin_dev:arm64+0x1008cef00) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#5 devtools_vfs::NfsMain(devtools_vfs::NfsMainOptions&) <null>:164731984 (srcfsd_darwin_dev:arm64+0x100809370) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
#6 main <null>:164731984 (srcfsd_darwin_dev:arm64+0x10015c9f8) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Thread T52 (tid=1939586, running) created by thread T41 at:
#0 pthread_create <null>:174073376 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2fd88) (BuildId: 981013a59ee23029b2ed90b76951327532000000200000000100000000000b00)
#1 work_pool_thread <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a30630) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
Thread T56 (tid=1939590, running) created by thread T53 at:
#0 pthread_create <null>:174073376 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x2fd88) (BuildId: 981013a59ee23029b2ed90b76951327532000000200000000100000000000b00)
#1 work_pool_thread <null>:164731984 (srcfsd_darwin_dev:arm64+0x100a30630) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00)
SUMMARY: ThreadSanitizer: data race (srcfsd_darwin_dev:arm64+0x1009ec81c) (BuildId: ad5f84c5659332ad9098efdf1add11ec32000000200000000100000000000b00) in _mdcache_lru_unref+0x60
==================
With ASAN:
SUMMARY: AddressSanitizer: unknown-crash (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x41204) (BuildId: f0a7ac5c49bc3abc851181b6f92b308a32000000200000000100000000000b00) in __asan_memset+0x104
Shadow bytes around the buggy address:
0x00702233af00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x00702233af10: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa
0x00702233af20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00702233af30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x00702233af40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x00702233af50: 00 00 00 00 00 00 00 00 00 00[04]00 00 00 00 00
0x00702233af60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00702233af70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00702233af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00702233af90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00702233afa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==78997==ABORTING