It just so happens on Linux that these structs overlap enough for init to leave them both properly zero’ed (and a quick glance at FreeBSDs version maybe suggests the same?).

Since they’re supposed to be opaque blobs of bytes, it’s okay to have this union mean “I only am one of these” but then the initialization and callers have to all be consistent.

Frank: you just did some export cleanups, should these both be rwlocks? (With someone taking a write lock when they current do mutex_lock). Should they actually be separate?

On Fri, Oct 2, 2020 at 17:28 <matvore@comcast.net> wrote:
Hi,



I've found a bug that only influences some platforms which manifests as a result of a union of two lock types in src/include/sal_data.h:



       union {

               /** Lock protecting state */

               pthread_mutex_t st_lock;

               /** Lock protecting export junctions */

               pthread_rwlock_t jct_lock;

       };



When I ran Ganesha on macOS with a MEM FSAL and tried to mount the share, I could see in the debug log that the same exact address was being treated as a jct_lock and then a st_lock:



$ grep 0x7feb0200a080 ~/ganesha.log

02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main] state_hdl_init :RW LOCK :F_DBG :Init rwlock 0x7feb0200a080 (&ostate->jct_lock) at /Users/matvore/ganesha/src/include/sal_functions.h:141

02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main] init_export_root :RW LOCK :F_DBG :Got write lock on 0x7feb0200a080 (&obj->state_hdl->jct_lock) at /Users/matvore/ganesha/src/support/exports.c:2595

02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main] init_export_root :RW LOCK :F_DBG :Unlocked 0x7feb0200a080 (&obj->state_hdl->jct_lock) at /Users/matvore/ganesha/src/support/exports.c:2605

02/10/2020 17:03:59 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[svc_9] nfs4_op_getattr :RW LOCK :CRIT :Error 22, acquiring mutex 0x7feb0200a080 (&(obj)->state_hdl->st_lock) at /Users/matvore/ganesha/src/Protocols/NFS/nfs4_op_getattr.c:122



The last log message corresponds to EINVAL and is only printed for acquiring a pthread mutex (not a rwlock). I think what is happening is that the root directory of the filesystem is sometimes treated as a normal directory and sometimes as a junction. I noticed the following code "works" on Linux but prints 22 on the second line on macOS:



#include <pthread.h>

#include <stdio.h>

#include <string.h>



int main(int argc, char **argv)

{

        union {

                pthread_mutex_t mut;

                pthread_rwlock_t rwl;

        } foo;



        memset((void *)&foo, 0, sizeof(foo));



        printf("ok if 0: %d\n", pthread_rwlock_init(&foo.rwl, NULL));

        printf("ok if 0: %d\n", pthread_mutex_lock(&foo.mut));

}



I will continue looking at this, but posting this now in case anyone has hints. I naively tried to remove the union and just treat the lock as a rwlock always (grabbing a write lock rather than a mutex lock), and this solved the EINVAL error+abort, but there may be something else wrong causing a deadlock, since I have experienced one when mounting, and I noticed the lock order changes based on the side of the union being used.



Thanks,

Matt

_______________________________________________

Devel mailing list -- devel@lists.nfs-ganesha.org

To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org