(resending response and adding devel mailing list – maybe someone else has some ideas)

 

There is this fix:

 

660f330243c57c0b2fea11c87507b3e1991bb300 FSAL_MDCACHE: avoid assertion due to wrong check

 

I’m pretty sure since V2.5 we’ve also fixed several places where op_ctx was not setup for things in the shutdown path, but I can’t find the patches.

 

Frank

 

 

From: Trishali Nayar [mailto:ntrishal@in.ibm.com]
Sent: Monday, January 14, 2019 6:04 AM
To: Frank Filz <ffilzlnx@mindspring.com>
Cc: 'Malahal R Naineni' <mnaineni@in.ibm.com>
Subject: RE: Crash seen in shutdown path

 

This was hit on our 2.5 code stream...

I did try to look into the community stream and even 2.7, but this particular code seemed same everywhere.

Thanks and regards,
Trishali.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trishali Nayar
IBM Systems
ETZ, Pune.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





From:        "Frank Filz" <ffilzlnx@mindspring.com>
To:        "'Trishali Nayar'" <ntrishal@in.ibm.com>
Cc:        "'Malahal R Naineni'" <mnaineni@in.ibm.com>
Date:        01/12/2019 04:22 AM
Subject:        RE: Crash seen in shutdown path





What code base is this with? If upstream, this may be fixed.
 
Frank
 
From: Trishali Nayar [mailto:ntrishal@in.ibm.com]
Sent:
Friday, January 11, 2019 7:09 AM
To:
Frank Filz <ffilzlnx@mindspring.com>
Cc:
Malahal R Naineni <mnaineni@in.ibm.com>
Subject:
Crash seen in shutdown path

 
Hi Frank,

I was looking at a crash in mdcache_lru_clean() routine which happened due to the assert, as "op_ctx" is not set. This was in the shutdown path via shutdown_handles()


The first_export_id is having a value of -1 for the entry.


I observed that for the other admin_thread routines in the same shutdown path...we call init_root_op_context() explicitly  Eg- in remove_all_exports() and unexport() etc.


So only when we get into the below path of Extra file handles hanging around...we will hit this problem.


static void shutdown_handles(struct fsal_module *fsal)
{
       /* Handle iterator */
       struct glist_head *hi = NULL;
       /* Next pointer in handle iteration */
       struct glist_head *hn = NULL;

       if (glist_empty(&fsal->handles))
               return;

      LogDebug(COMPONENT_FSAL, "Extra file handles hanging around.");   <<<< in below path
       glist_for_each_safe(hi, hn, &fsal->handles) {              
               struct fsal_obj_handle *h = glist_entry(hi,
                                                       struct fsal_obj_handle,
                                                       handles);
               LogDebug(COMPONENT_FSAL,
                        "Releasing handle");
               h->obj_ops->release(h);
       }
}

1> I was thinking maybe we should fix this by calling
init_root_op_context() when we get into the "Extra file handles" path...

2> But we would still hit the second assert for "op_ctx->ctx_export"
    So could we also move the second assert into a condition:-


              if (export_id >= 0 )
                  assert(op_ctx->ctx_export)
The stack trace and the values in the entry are attached here for reference as well :  


Your insights on this will be extremely useful.


Thanks and regards,
Trishali.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trishali Nayar
IBM Systems
ETZ, Pune.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~