So after a bit of testing and debug, and realizing I sort of skipped some work, I have posted a new set of patches.
The HEAD patch in the series is: https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/437191/
I have done some testing and there is work yet to be done…
FSAL_VFS passes pynfs (no real async)
FSAL_MEM without any async enabled reasonably passes pynfs (I know it can’t pass 100%)
FSAL_MEM with async such that the async actually completes before the read2/write2 call returns passes
FSAL_MEM with actual async fails pynfs with:
File "/home/ffilz/ganesha/bf-pynfs/nfs4.0/lib/rpc/rpc.py", line 452, in check_reply
raise RPCAcceptError(msg.areply)
rpc.rpc.RPCAcceptError: RPCError: MSG_ACCEPTED: GARBAGE_ARGS
RC = 1
This suggests to me that my assumptions and read of the RPC layer code is wrong…
I’m open to any suggestions.
FSAL_MEM is configurable for async with the following:
MEM
{
Async_Threads = 10;
}
And here is my EXPORT that fails:
EXPORT
{
Export_Id = 701;
Path = /mem1;
Pseudo = /export/mem1;
FSAL
{
Name = MEM;
Async_Type = FIXED;
Async_Delay = 1000;
Async_Stall_Delay = 0;
}
Access_Type=RW;
Squash = Root;
SecType = sys;
Tag = mem1;
MaxRead = 9000000;
MaxWrite = 9000000;
Protocols=3,4,9p;
Anonymous_uid = -3;
Anonymous_gid = -3;
CLIENT
{
Access_Type=RW;
Squash=None;
Clients=simple1*,127.0.0.1,local*,192.168.0.119,192.168.0.111;
SecType = sys;
Anonymous_gid = -5;
}
}
If Async_Delay is set to 10 and Async_Stall_Delay is set to 1000, it passes (this forces read2/write2 to delay 1000 usec before returning, while the async thread is only going to delay 10 usec before processing).
Hmm, for testing I just realized I need to make a 2nd stall… I need to stall in the protocol read/write functions, so after they have tested to see if the call back has completed before they return (thus causing a real async schedule), they then delay so they don’t actually return out all the way until the async process has a chance to run all the way to completion. I will be adding a debug config option to control that…
In the meantime, if anyone with RPC understanding can look over the code and help me understand what might have gone wrong, I would appreciate it.
Thanks
Frank