The RFC says this about co_ownerid
The string should be unique so that multiple clients do not
present the same string. The consequences of two clients
presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly
cancelled.
These are some more details on the individual issues raised from setups having the same hostname
-
Client 1 -> thread 1 was doing a open. It was inside the get_state_owner function trying to get a owner for the open state. It acquired ht_owner->partitions[15].lock created
a new open owner and was trying to hold the mutex on cr1 aka nfs4_owner->so_clientrec->cid_mutex so
it could add the owner to the client record cr1.
-
Client2
-> thread 2 was doing a create session. It was inside nfs_clientid_expire function.
It was parsing the open owner list inside c1 and trying to 'delete' each owner (the second while(true) loop). It was holding nfs4_owner->so_clientrec->cid_mutex and
trying to get ht_owner->partitions[15].lock
Q2:
If we were to fix this deadlock, whats the recommended order for acquiring the locks - should we acquire the client_rec->cid_mutex first and then acquire ht_owner table lock or vice versa This
looks like it could be a common deadlock pattern we might've encountered before since we have 2 structures referencing each other each with their own locks
I can update more details on the other failures(asserts, segfaults etc). Let me know if that is needed
Thanks,
Deepak