It just so happens on Linux that these structs overlap enough for init to
leave them both properly zero’ed (and a quick glance at FreeBSDs version
maybe suggests the same?).
Since they’re supposed to be opaque blobs of bytes, it’s okay to have this
union mean “I only am one of these” but then the initialization and callers
have to all be consistent.
Frank: you just did some export cleanups, should these both be rwlocks?
(With someone taking a write lock when they current do mutex_lock). Should
they actually be separate?
On Fri, Oct 2, 2020 at 17:28 <matvore(a)comcast.net> wrote:
Hi,
I've found a bug that only influences some platforms which manifests as a
result of a union of two lock types in src/include/sal_data.h:
union {
/** Lock protecting state */
pthread_mutex_t st_lock;
/** Lock protecting export junctions */
pthread_rwlock_t jct_lock;
};
When I ran Ganesha on macOS with a MEM FSAL and tried to mount the share,
I could see in the debug log that the same exact address was being treated
as a jct_lock and then a st_lock:
$ grep 0x7feb0200a080 ~/ganesha.log
02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main]
state_hdl_init :RW LOCK :F_DBG :Init rwlock 0x7feb0200a080
(&ostate->jct_lock) at
/Users/matvore/ganesha/src/include/sal_functions.h:141
02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main]
init_export_root :RW LOCK :F_DBG :Got write lock on 0x7feb0200a080
(&obj->state_hdl->jct_lock) at
/Users/matvore/ganesha/src/support/exports.c:2595
02/10/2020 17:02:28 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[main]
init_export_root :RW LOCK :F_DBG :Unlocked 0x7feb0200a080
(&obj->state_hdl->jct_lock) at
/Users/matvore/ganesha/src/support/exports.c:2605
02/10/2020 17:03:59 : epoch 5f77bf94 : <snip> : ganesha.nfsd-95907[svc_9]
nfs4_op_getattr :RW LOCK :CRIT :Error 22, acquiring mutex 0x7feb0200a080
(&(obj)->state_hdl->st_lock) at
/Users/matvore/ganesha/src/Protocols/NFS/nfs4_op_getattr.c:122
The last log message corresponds to EINVAL and is only printed for
acquiring a pthread mutex (not a rwlock). I think what is happening is that
the root directory of the filesystem is sometimes treated as a normal
directory and sometimes as a junction. I noticed the following code "works"
on Linux but prints 22 on the second line on macOS:
#include <pthread.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
union {
pthread_mutex_t mut;
pthread_rwlock_t rwl;
} foo;
memset((void *)&foo, 0, sizeof(foo));
printf("ok if 0: %d\n", pthread_rwlock_init(&foo.rwl, NULL));
printf("ok if 0: %d\n", pthread_mutex_lock(&foo.mut));
}
I will continue looking at this, but posting this now in case anyone has
hints. I naively tried to remove the union and just treat the lock as a
rwlock always (grabbing a write lock rather than a mutex lock), and this
solved the EINVAL error+abort, but there may be something else wrong
causing a deadlock, since I have experienced one when mounting, and I
noticed the lock order changes based on the side of the union being used.
Thanks,
Matt
_______________________________________________
Devel mailing list -- devel(a)lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave(a)lists.nfs-ganesha.org