(resending response and adding devel mailing list – maybe someone else has some ideas)

There is this fix:

660f330243c57c0b2fea11c87507b3e1991bb300 FSAL_MDCACHE: avoid assertion due to wrong check

I’m pretty sure since V2.5 we’ve also fixed several places where op_ctx was not setup for things in the shutdown path, but I can’t find the patches.

Frank

From: Trishali Nayar [mailto:ntrishal@in.ibm.com]
Sent: Monday, January 14, 2019 6:04 AM
To: Frank Filz <ffilzlnx@mindspring.com>
Cc: 'Malahal R Naineni' <mnaineni@in.ibm.com>
Subject: RE: Crash seen in shutdown path

This was hit on our 2.5 code stream...

I did try to look into the community stream and even 2.7, but this particular code seemed same everywhere.

Thanks and regards,
Trishali.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trishali Nayar
IBM Systems
ETZ, Pune.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Trishali Nayar'" <ntrishal@in.ibm.com>
Cc: "'Malahal R Naineni'" <mnaineni@in.ibm.com>
Date: 01/12/2019 04:22 AM
Subject: RE: Crash seen in shutdown path

What code base is this with? If upstream, this may be fixed.

Frank

From: Trishali Nayar [mailto:ntrishal@in.ibm.com]
Sent: Friday, January 11, 2019 7:09 AM
To: Frank Filz <ffilzlnx@mindspring.com>
Cc: Malahal R Naineni <mnaineni@in.ibm.com>
Subject: Crash seen in shutdown path

Hi Frank,

I was looking at a crash in mdcache_lru_clean() routine which happened due to the assert, as "op_ctx" is not set. This was in the shutdown path via shutdown_handles()

The first_export_id is having a value of -1 for the entry.

I observed that for the other admin_thread routines in the same shutdown path...we call init_root_op_context() explicitly Eg- in remove_all_exports() and unexport() etc.

So only when we get into the below path of Extra file handles hanging around...we will hit this problem.

static void shutdown_handles(struct fsal_module *fsal)
{
/* Handle iterator */
struct glist_head *hi = NULL;
/* Next pointer in handle iteration */
struct glist_head *hn = NULL;

if (glist_empty(&fsal->handles))
return;

LogDebug(COMPONENT_FSAL, "Extra file handles hanging around."); <<<< in below path
glist_for_each_safe(hi, hn, &fsal->handles) {
struct fsal_obj_handle *h = glist_entry(hi,
struct fsal_obj_handle,
handles);
LogDebug(COMPONENT_FSAL,
"Releasing handle");
h->obj_ops->release(h);
}
}

1> I was thinking maybe we should fix this by calling init_root_op_context() when we get into the "Extra file handles" path...

2> But we would still hit the second assert for "op_ctx->ctx_export"
So could we also move the second assert into a condition:-

if (export_id >= 0 )
assert(op_ctx->ctx_export)
The stack trace and the values in the entry are attached here for reference as well :

Your insights on this will be extremely useful.

Thanks and regards,
Trishali.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trishali Nayar
IBM Systems
ETZ, Pune.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~