So after a bit of testing and debug, and realizing I sort of skipped some work, I have posted a new set of patches.

 

The HEAD patch in the series is: https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/437191/

 

I have done some testing and there is work yet to be done…

 

FSAL_VFS passes pynfs (no real async)

FSAL_MEM without any async enabled reasonably passes pynfs (I know it can’t pass 100%)

FSAL_MEM with async such that the async actually completes before the read2/write2 call returns passes

FSAL_MEM with actual async fails pynfs with:

  File "/home/ffilz/ganesha/bf-pynfs/nfs4.0/lib/rpc/rpc.py", line 452, in check_reply

    raise RPCAcceptError(msg.areply)

rpc.rpc.RPCAcceptError: RPCError: MSG_ACCEPTED: GARBAGE_ARGS

RC = 1

 

This suggests to me that my assumptions and read of the RPC layer code is wrong…

 

I’m open to any suggestions.

 

FSAL_MEM is configurable for async with the following:

 

MEM

{

               Async_Threads = 10;

}

 

And here is my EXPORT that fails:

 

EXPORT

{

               Export_Id = 701;

               Path = /mem1;

               Pseudo = /export/mem1;

               FSAL

               {

                              Name = MEM;

                              Async_Type = FIXED;

                              Async_Delay = 1000;

                              Async_Stall_Delay = 0;

               }

  Access_Type=RW;

  Squash = Root;

  SecType = sys;

  Tag = mem1;

  MaxRead = 9000000;

  MaxWrite = 9000000;

  Protocols=3,4,9p;

  Anonymous_uid = -3;

  Anonymous_gid = -3;

  CLIENT

  {

               Access_Type=RW;

               Squash=None;

               Clients=simple1*,127.0.0.1,local*,192.168.0.119,192.168.0.111;

               SecType = sys;

               Anonymous_gid = -5;

  }

}

 

If Async_Delay is set to 10 and Async_Stall_Delay is set to 1000, it passes (this forces read2/write2 to delay 1000 usec before returning, while the async thread is only going to delay 10 usec before processing).

 

Hmm, for testing I just realized I need to make a 2nd stall… I need to stall in the protocol read/write functions, so after they have tested to see if the call back has completed before they return (thus causing a real async schedule), they then delay so they don’t actually return out all the way until the async process has a chance to run all the way to completion. I will be adding a debug config option to control that…

 

In the meantime, if anyone with RPC understanding can look over the code and help me understand what might have gone wrong, I would appreciate it.

 

Thanks

 

Frank