So after a bit of testing and debug, and realizing I sort of skipped some
work, I have posted a new set of patches.
 
The HEAD patch in the series is:
https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/437191/
 
I have done some testing and there is work yet to be done.
 
FSAL_VFS passes pynfs (no real async)
FSAL_MEM without any async enabled reasonably passes pynfs (I know it can't
pass 100%)
FSAL_MEM with async such that the async actually completes before the
read2/write2 call returns passes
FSAL_MEM with actual async fails pynfs with:
  File "/home/ffilz/ganesha/bf-pynfs/nfs4.0/lib/rpc/rpc.py", line 452, in
check_reply
    raise RPCAcceptError(msg.areply)
rpc.rpc.RPCAcceptError: RPCError: MSG_ACCEPTED: GARBAGE_ARGS
RC = 1
 
This suggests to me that my assumptions and read of the RPC layer code is
wrong.
 
I'm open to any suggestions.
 
FSAL_MEM is configurable for async with the following:
 
MEM
{
               Async_Threads = 10;
}
 
And here is my EXPORT that fails:
 
EXPORT
{
               Export_Id = 701;
               Path = /mem1;
               Pseudo = /export/mem1;
               FSAL
               {
                              Name = MEM;
                              Async_Type = FIXED;
                              Async_Delay = 1000;
                              Async_Stall_Delay = 0;
               }
  Access_Type=RW;
  Squash = Root;
  SecType = sys;
  Tag = mem1;
  MaxRead = 9000000;
  MaxWrite = 9000000;
  Protocols=3,4,9p;
  Anonymous_uid = -3;
  Anonymous_gid = -3;
  CLIENT
  {
               Access_Type=RW;
               Squash=None;
 
Clients=simple1*,127.0.0.1,local*,192.168.0.119,192.168.0.111;
               SecType = sys;
               Anonymous_gid = -5;
  }
}
 
If Async_Delay is set to 10 and Async_Stall_Delay is set to 1000, it passes
(this forces read2/write2 to delay 1000 usec before returning, while the
async thread is only going to delay 10 usec before processing).
 
Hmm, for testing I just realized I need to make a 2nd stall. I need to stall
in the protocol read/write functions, so after they have tested to see if
the call back has completed before they return (thus causing a real async
schedule), they then delay so they don't actually return out all the way
until the async process has a chance to run all the way to completion. I
will be adding a debug config option to control that.
 
In the meantime, if anyone with RPC understanding can look over the code and
help me understand what might have gone wrong, I would appreciate it.
 
Thanks
 
Frank