Hello,
(Hi Florent!)
Olivier Garaud wrote on Wed, Mar 11, 2020 at 05:50:42PM +0100:
I'm wondering about the way Ganesha loads the FSAL
in load_fsal we have
> #if defined(LINUX) && !defined(SANITIZE_ADDRESS)
> dl = dlopen(path, RTLD_NOW | RTLD_LOCAL | RTLD_DEEPBIND);
> #elif defined(FREEBSD) || defined(SANITIZE_ADDRESS)
> dl = dlopen(path, RTLD_NOW | RTLD_LOCAL);
> #endif
>
On my system Ganesha is built with jemalloc but my FSAL is not (it's not
using the same build chain)
My FSAL is using malloc_usable_size.
Ganesha does not use this symbol so it is resolved during dlopen.
Because the RTLD_DEEPBIND flag is used, the libc version of
malloc_usable_size is instead of the jemalloc one which ultimately leads to
a crash.
I could make sure both are built with the same allocator but this also
happens when I try to dynamically change the memory allocator (LD_PRELOAD
tcmalloc).
Looking to the history (back in 2012) the RTLD_DEEPBIND flags has been
introduced / removed and finally kept.
Can somebody remember the reason why it is needed ?
Did someone else had this kind of issue with LD_PRELOAD ?
Going from memory only, I think we originally needed it for the HPSS
FSAL because HPSS uses the system version of tirpc (while we use
ntirpc), and we needed the flag to make sure the HPSS FSAL would use its
own tirpc calls while the rest of ganesha use their own ntirpc calls.
I don't remember why it got removed and added back though...
That being said, I don't think your crash have anything to do with
malloc_usable_size (it does have to do with deepbind); I think some part
of ganesha tries to free something that was allocated by the FSAL maybe?
If we keep matching frees of stuff allocated in a lib so they are freed
in the lib with proper wrappers that might help rid of this problem...
Well, keeping the allocator coherent accross ganesha is probably a
better idea in the long run; we no longer use the HPSS FSALs so unless
someone remember why it deepbind was added back I would be fine trying
to remove it again.
Maybe we had two FSALs depending on different versions of some lib? I
could see that happening on some exotic platforms...
--
Dominique