On 4/17/19 12:49 AM, Pradeep wrote:
On 4/11/19, Frank Filz <ffilzlnx(a)mindspring.com> wrote:
>> From: Daniel Gryniewicz [mailto:dang@redhat.com]
>> Sent: Thursday, April 11, 2019 5:28 AM
>>
>> Okay, I misunderstood the situation. That fix won't work, because
>> mdcache_lru_unref_chunk() takes the qlane lock, and it also takes the
>> content
>> lock, which must be taken before the qlane lock.
>>
>> The problem here is that the parent has a pointer to the chunk and a ref
>> on it,
>> and the chunk has a pointer to the parent but no ref on it.
>> This means that the parent refcount can go to zero while the chunk has a
>> pointer. I think we need to force remove all the chunks when we recycle
>> the
>> parent, and null out the parent pointer in the chunk as well. We'll have
>> to audit
>> all the uses of the parent pointer and make sure the call path has a ref
>> to the
>> parent, or that it can handle a null parent pointer.
Hi Daniel,
By force cleaning, did you have something like this in mind?
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/451311/1/src/FSAL/Stack...
I was thinking something more like this:
https://paste.fedoraproject.org/paste/2CJh-0heaj-nQQeb7Ou5KQ
Your solution may work; but it worries me to add an unbounded loop into
the mix, especially since this involves multiple locks.
Note, I haven't fully audited all the callpaths to make sure that they
can handle parent being NULL. I believe all the ones that can't have a
ref on parent, so it should be okay.
Daniel