On Mon, Jun 28, 2021 at 10:45 AM Nick Couchman <nick.e.couchman@gmail.com> wrote:
On Sun, Jun 27, 2021 at 10:46 AM Nick Couchman <nick.e.couchman@gmail.com> wrote:
On Sun, Jun 27, 2021 at 12:55 AM Solomon Boulos <boulos@google.com> wrote:
Good progress! I’ll try to take a look on Monday. Does the proxy FSAL misrepresent it’s readdirplus support? (Seems possible, it should honor whatever the backend says)


That would be great, I'd definitely appreciate any insight. Not sure about the FSAL module and its representation of readdirplus support, but I can dig through the code and take a look. That one I can work around - the Core Dump when the Proxy v4 module has Handle Mapping enabled is a bit more problematic. The function it's referencing at the top - digest_alloc - is simple enough, not really much I can see that would cause a core dump, there.

Regarding the segfault/core dump, I've got the back trace, here:

(gdb) bt
#0  0x00007f4abbea6704 in digest_alloc ()
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle_mapping/handle_mapping.c:68
#1  0x00007f4abbea70a2 in handle_mapping_hash_add (p_hash=0x0, object_id=0, handle_hash=2762487084, data=0x7f4abe858432, datalen=29)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle_mapping/handle_mapping.c:193
#2  0x00007f4abbea75b2 in HandleMap_SetFH (p_in_nfs23_digest=0x7f4abe858420, data=0x7f4abe858432, len=29)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle_mapping/handle_mapping.c:373
#3  0x00007f4abbea51df in proxyv4_alloc_handle (exp=0x7f4abe9fc300, fh=0x7ffd7e7a0a40, obj_attributes=0x7ffd7e7a0ae0,
    attrs_out=0x7ffd7e7a10b0) at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle.c:2997
#4  0x00007f4abbea0883 in proxyv4_make_object (export=0x7f4abe9fc300, obj_attributes=0x7ffd7e7a0ae0, fh=0x7ffd7e7a0a40,
    handle=0x7ffd7e7a1038, attrs_out=0x7ffd7e7a10b0)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle.c:1399
#5  0x00007f4abbea0f01 in proxyv4_lookup_impl (parent=0x0, export=0x7f4abe9fc300, cred=0x7ffd7e7a1258,
    path=0x7f4abe813171 "DataBoxDisk1", handle=0x7ffd7e7a1038, attrs_out=0x7ffd7e7a10b0)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle.c:1545
#6  0x00007f4abbea5465 in proxyv4_lookup_path (exp_hdl=0x7f4abe9fc300, path=0x7f4abe897628 "/DataBoxDisk1", handle=0x7ffd7e7a11b0,
    attrs_out=0x7ffd7e7a10b0) at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/FSAL_PROXY_V4/handle.c:3068
#7  0x00007f4ac8a0b34d in mdcache_lookup_path (exp_hdl=0x7f4abe82b400, path=0x7f4abe897628 "/DataBoxDisk1", handle=0x7ffd7e7a1338,
    attrs_out=0x0) at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1564
#8  0x00007f4ac89861d7 in init_export_root (export=0x7f4abe877cc8)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/support/exports.c:2619
#9  0x00007f4ac89858e6 in init_export_cb (exp=0x7f4abe877cc8, state=0x7ffd7e7a1420)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/support/exports.c:2458
#10 0x00007f4ac899e4c3 in foreach_gsh_export (cb=0x7f4ac89858c2 <init_export_cb>, wrlock=true, state=0x7ffd7e7a1420)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/support/export_mgr.c:801
#11 0x00007f4ac898593c in exports_pkginit () at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/support/exports.c:2477
#12 0x00007f4ac890ac62 in nfs_Init (p_start_info=0x6041a8 <my_nfs_start_info>)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/MainNFSD/nfs_init.c:642
#13 0x00007f4ac890bb2d in nfs_start (p_start_info=0x6041a8 <my_nfs_start_info>)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/MainNFSD/nfs_init.c:923
#14 0x00000000004029b9 in main (argc=7, argv=0x7ffd7e7a1838)
    at /usr/src/debug/nfs-ganesha-4-dev.65.2.el8.x86_64/src/MainNFSD/nfs_main.c:520
 
This is quite puzzling, because the line that seems to be causing the segfault is the pool_alloc line below:

==
pool_t *digest_pool;
static pthread_mutex_t digest_pool_mutex = PTHREAD_MUTEX_INITIALIZER;

pool_t *handle_pool;
static pthread_mutex_t handle_pool_mutex = PTHREAD_MUTEX_INITIALIZER;

/* helpers for pool allocation */

static digest_pool_entry_t *digest_alloc()
{
        digest_pool_entry_t *p_new;

        PTHREAD_MUTEX_lock(&digest_pool_mutex);
        p_new = pool_alloc(digest_pool);
        PTHREAD_MUTEX_unlock(&digest_pool_mutex);

        return p_new;
}
==

I don't see how this call could be causing a segfault, though, as this function is supposed to be initializing the pool. I also checked other instances of pool_alloc() throughout the code and don't really see substantial differences??

I'm sure I'm missing something simple, here, but still not sure why this is causing a segfault, unless the presence of the mutex lock/unlock does something to mess with it?

Regarding the readdirplus support, I tried manually disabling it in the proxy_v4 FSAL module by adding ".readdir_plus = false," to the info initialization in main.c. This doesn't seem to have a made a difference, though, so it almost seems as if the NFSv3 server support for it in Ganesha overrides or ignores whatever the V4 proxy module sets, so that it is expecting readdirplus to work.

I also don't know why readdirplus would not work - maybe something just needs to be implemented in the V4 proxy module to pass the call through??

-Nick