Raised an issue here
https://github.com/nfs-ganesha/nfs-ganesha/issues/934
________________________________
From: Frank Filz <ffilzlnx(a)mindspring.com>
Sent: Tuesday, April 25, 2023 3:15 PM
To: Deepak Arumugam Sankara Subramanian <deepakarumugam.s(a)nutanix.com>;
devel(a)lists.nfs-ganesha.org <devel(a)lists.nfs-ganesha.org>
Cc: Pradeep Thomas <pradeep.thomas(a)nutanix.com>; Pragyashree Gogoi
<pragyashree.gogoi(a)nutanix.com>; Sagar Singh <sagar.singh(a)nutanix.com>; Gaurav
Gangalwar <gaurav.gangalwar(a)nutanix.com>; Andrew Lin <andrew.lin(a)nutanix.com>
Subject: RE: [NFS-Ganesha-Devel] Client id expiration - multiple clients using same
hostname and deadlock issues.
Please open a github issue for this. We definitely need to not crash or deadlock.
It sounds like as long as we don’t fail miserably, the client can’t expect anything.
I’m not sure if we can handle the situation more gracefully and still do the right thing
in cases where this situation is caused by a client whose IP address has changed for some
reason or other things that might make an actual single client appear to the server as
multiple clients.
Frank
From: Deepak Arumugam Sankara Subramanian [mailto:deepakarumugam.s@nutanix.com]
Sent: Tuesday, April 25, 2023 2:04 PM
To: devel(a)lists.nfs-ganesha.org
Cc: Pradeep Thomas <pradeep.thomas(a)nutanix.com>; Pragyashree Gogoi
<pragyashree.gogoi(a)nutanix.com>; Sagar Singh <sagar.singh(a)nutanix.com>; Gaurav
Gangalwar <gaurav.gangalwar(a)nutanix.com>; Andrew Lin <andrew.lin(a)nutanix.com>
Subject: [NFS-Ganesha-Devel] Client id expiration - multiple clients using same hostname
and deadlock issues.
Hi ,
We recently ran into some issues in client-id expiration code paths.
We've had multiple issues where a CREATE_SESSION tries to expire a clientid when the
clientid is being used by other rpcs like OPEN, EXCHANGE_ID etc.You typically don't
expect a client to send a OPEN/EXCHANGE_ID rpc while the server is processing a
CREATE_SESSSION rpc unless the client is misbehaving. We found that in our labs/test-beds
multiple clients had the same hostname set and ganesha was mapping them to the same client
record.
Now although the clients were misbehaving we feel that the server should handle/fail
gracefully. In these issues the server usually runs into an unexpected assert,segfault and
crashes or it runs into a deadlock and hangs forever .
Q1: We need some recommendations on whats the right thing to do when multiple clients use
the same owner_id.
The RFC says this about co_ownerid
The string should be unique so that multiple clients do not
present the same string. The consequences of two clients
presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly
cancelled.
but we wanted to know if one of these responses is better than other.
These are some more details on the individual issues raised from setups having the same
hostname
Deadlock:
Two client ids say c1 and c2 from 2 different clients were associated with the same
co_ownerid and same record cr1
1. Client 1 -> thread 1 was doing a open. It was inside the get_state_owner function
trying to get a owner for the open state. It acquired ht_owner->partitions[15].lock
created a new open owner and was trying to hold the mutex on cr1 aka
nfs4_owner->so_clientrec->cid_mutex so it could add the owner to the client record
cr1.
1. Client2 -> thread 2 was doing a create session. It was inside nfs_clientid_expire
function. It was parsing the open owner list inside c1 and trying to 'delete' each
owner (the second while(true) loop). It was holding
nfs4_owner->so_clientrec->cid_mutex and trying to get
ht_owner->partitions[15].lock
Q2: If we were to fix this deadlock, whats the recommended order for acquiring the locks -
should we acquire the client_rec->cid_mutex first and then acquire ht_owner table lock
or vice versa This looks like it could be a common deadlock pattern we might've
encountered before since we have 2 structures referencing each other each with their own
locks
I can update more details on the other failures(asserts, segfaults etc). Let me know if
that is needed
Thanks,
Deepak