On Tue, 24 Mar 2020, 7:54 pm Daniel Gryniewicz, <dang@redhat.com> wrote:
Okay, I think this should fix it:

https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/487898

Yes, this should be enough. Thanks Daniel for the patch! 

Regards, 
Ashish


Daniel

On 3/23/20 9:52 AM, Daniel Gryniewicz wrote:
> On 3/20/20 6:56 AM, Ashish Sangwan wrote:
>> On Tue, Mar 17, 2020 at 6:57 PM Daniel Gryniewicz <dang@redhat.com>
>> wrote:
>>>
>>> Replying inline, so I can handle both emails at once.
>>>
>>> On 3/15/20 1:17 PM, Ashish Sangwan wrote:
>>>> The other issue I saw with objects lying around even after the
>>>> corresponding export is deleted is when the export is created again
>>>> with same attributes.
>>>> We start getting cache hits for the old objects. The "struct
>>>> fsal_filesystem *fs" for these objects is now pointing to the fs which
>>>> we already freed in vfs_unexport_filesystems by calling
>>>> release_posix_file_system()
>>>
>>> I'm not sure this is a problem.  The old cache entry is being found
>>> because VFS generates the same key the second time around as it did on
>>> the first time around.  This means that the cache entry would be the
>>> same, and so would be recreated with the same data.  Re-using the old
>>> cache entry should be fine.  And, if the export is not re-created
>>> immediately, those entries will be recycled by the LRU.
>>
>> I am able to hit "Use after free" crash with ASAN.
>> As mentioned above, it's because the cache entry's fs pointer is
>> pointing to freed memory.
>
> Okay, that's definitely an issue.
>
> Daniel
>>
>>>
>>>>
>>>> On Sun, Mar 15, 2020 at 10:09 PM Ashish Sangwan
>>>> <ashishsangwan2@gmail.com> wrote:
>>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> It looks like mdcache_lru_cleanup_try_push() can be improved for the
>>>>> cases when there are ops executing parallely with an unexport request.
>>>>> In this case mdcache_lru_cleanup_try_push() is not able to drop the
>>>>> sentinel ref and mark the entry for cleanup. This is because one extra
>>>>> ref is taken by the op which is currently executing. The total number
>>>>> of refs are > 2.
>>>>>
>>>>> Once the current op execution finishes (after the execution of
>>>>> mdcache_unexport()) the sentinel ref remains and the entry is also not
>>>>> marked for cleanup even though no export is referring to it
>>>>> (entry->first_export_id == -1).
>>>>> Is it possible to modify mdcache_lru_cleanup_try_push() such that even
>>>>> if the refcnt is > 2 but if the entry->first_export_id == -1 (the
>>>>> current export which is going away was the only one referring to this
>>>>> entry), we should drop the sentinel ref and mark the entry for
>>>>> cleanup? This will ensure we cleanup the entries when the current op
>>>>> does put_ref and these are reused faster than other mdcache entries
>>>>> which are actually pointing to valid exports.
>>>>>
>>>
>>> This would be possible, but I don't think it would help significantly.
>>> It narrows the race window by ~9 lines of code (admittedly including 3
>>> lock ops), but the window between when the ref it taken and
>>> first_export_id is set to -1 still exists.
>>>
>>> These entries are not leaked, the LRU will recycle them eventually,
>>> since only their sentinel ref is left.  Or, if the export is re-created
>>> exactly, they can be reused in the new export.
>>>
>>> Daniel
>>>
>>