Okay, I think this should fix it:
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/487898
Daniel
On 3/23/20 9:52 AM, Daniel Gryniewicz wrote:
On 3/20/20 6:56 AM, Ashish Sangwan wrote:
> On Tue, Mar 17, 2020 at 6:57 PM Daniel Gryniewicz <dang(a)redhat.com>
> wrote:
>>
>> Replying inline, so I can handle both emails at once.
>>
>> On 3/15/20 1:17 PM, Ashish Sangwan wrote:
>>> The other issue I saw with objects lying around even after the
>>> corresponding export is deleted is when the export is created again
>>> with same attributes.
>>> We start getting cache hits for the old objects. The "struct
>>> fsal_filesystem *fs" for these objects is now pointing to the fs which
>>> we already freed in vfs_unexport_filesystems by calling
>>> release_posix_file_system()
>>
>> I'm not sure this is a problem. The old cache entry is being found
>> because VFS generates the same key the second time around as it did on
>> the first time around. This means that the cache entry would be the
>> same, and so would be recreated with the same data. Re-using the old
>> cache entry should be fine. And, if the export is not re-created
>> immediately, those entries will be recycled by the LRU.
>
> I am able to hit "Use after free" crash with ASAN.
> As mentioned above, it's because the cache entry's fs pointer is
> pointing to freed memory.
Okay, that's definitely an issue.
Daniel
>
>>
>>>
>>> On Sun, Mar 15, 2020 at 10:09 PM Ashish Sangwan
>>> <ashishsangwan2(a)gmail.com> wrote:
>>>>
>>>> Hi Daniel,
>>>>
>>>> It looks like mdcache_lru_cleanup_try_push() can be improved for the
>>>> cases when there are ops executing parallely with an unexport request.
>>>> In this case mdcache_lru_cleanup_try_push() is not able to drop the
>>>> sentinel ref and mark the entry for cleanup. This is because one extra
>>>> ref is taken by the op which is currently executing. The total number
>>>> of refs are > 2.
>>>>
>>>> Once the current op execution finishes (after the execution of
>>>> mdcache_unexport()) the sentinel ref remains and the entry is also not
>>>> marked for cleanup even though no export is referring to it
>>>> (entry->first_export_id == -1).
>>>> Is it possible to modify mdcache_lru_cleanup_try_push() such that even
>>>> if the refcnt is > 2 but if the entry->first_export_id == -1 (the
>>>> current export which is going away was the only one referring to this
>>>> entry), we should drop the sentinel ref and mark the entry for
>>>> cleanup? This will ensure we cleanup the entries when the current op
>>>> does put_ref and these are reused faster than other mdcache entries
>>>> which are actually pointing to valid exports.
>>>>
>>
>> This would be possible, but I don't think it would help significantly.
>> It narrows the race window by ~9 lines of code (admittedly including 3
>> lock ops), but the window between when the ref it taken and
>> first_export_id is set to -1 still exists.
>>
>> These entries are not leaked, the LRU will recycle them eventually,
>> since only their sentinel ref is left. Or, if the export is re-created
>> exactly, they can be reused in the new export.
>>
>> Daniel
>>
>