On Sun, Jan 27, 2019 at 4:59 PM Dominique Martinet <
dominique.martinet(a)cea.fr> wrote:
Hi Bjorn,
Bjorn Leffler wrote on Sun, Jan 27, 2019 at 04:40:06PM -0800:
> I'm a software engineer at Google, working on cloud and NFS related
> projects. I've been working with the Ganesha codebase to implement data
> caching as stackable FSALs. These can be stacked onto any other FSAL, but
> my immediate goal is to use this on top of FSAL_PROXY. That turns Ganesha
> into an efficient NFS caching proxy across WAN links, for example between
> different on-premise locations and/or cloud providers.
Welcome! Great to see interest in FSAL_PROXY, there haven't been many
users recently and it's been a struggle to keep it up and running.
A developer on that area is definitely more than welcome.
The FSAL_PROXY seems to work well, but sure needs some love. The wiki is 4+
years old and the sample configuration isn't working anymore. I'll submit a
working proxy sample configuration as my first patch, just to learn the
tools and the processes.
I've got a few patches that I'd like to contribute to the
main code base:
> 1. A small change to make FSAL_NULL reusable.
> 2. Add a generic FSAL_DATACACHE for the caching logic.
> 3. Add specific cache implementation. I've implemented two: Memcache and
> local disk/ssd.
>
> It would probably makes sense to implement some a ram only cache as well.
Would be curious to see how the local disk/ssd works before answering
that, is it a local fs? using the disk as a block device directly?
Either way there also are ways of working with ram there, and memcached
is basically a ram backend as well, I'm not sure it makes sense to have
many options that aren't used often - most other fsal have some other
form of data cache (e.g. VFS has the VFS cache, gluster/ceph have some
cache in the libs afaik? etc) so I'd rather keep a minimal set of such
FSALs and focus on keeping them working.
For the disk caching option, the user specifies a path to a local file
system. You could also point that to tmpfs for ram only caching, which
avoids implementing a ram cache as you suggest. I think of ssd as cheap and
slow ram. It's a very compelling option in the cloud, where you pay a
linear cost for ram, but ssd is charged at a (cheaper) fixed rate for a
larger device. The memcache option is interesting for the scale out use
cases. If you have 10 proxies, then the total cache size is 10x the local
cache size. And you can back memcache with ssd using the Fatcache
<
https://github.com/twitter/fatcache> implementation of memcache.
The caching only works on the read path to keep it simple. Most of my users
will use this for caching of (read-only) input datasets. Some workloads
(movie rendering in particular) are highly cacheable. I initially wrote two
implementations for disk and memcache, but there was a lot of duplicate
code for cache invalidation, etc. The specific cache implementations are
now limited to a simple get/set/delete interface.
Memcache may sound like a strange choice, but is interesting because
of
the
> scale out nature, and that it's already battle tested in large scale
> production. Initial performance with memcache/ssd caches looks really
good:
> I'm getting up to 1.5 GB/s for streaming reads, which is slightly better
> than I get from the kernel NFS server serving from ram.
Nice results!
How do you handle stuff like cache invalidation, send a stat to the real
server and look at ctime?
The cache invalidation is currently based on file mtime. Any change to the
file invalidates the current cache entries. A user configurable TTL in
seconds determine how often stat requests are sent to the backend server to
refresh the mtime. At the moment, I actually have two caches: one for the
data and one for metadata. I tried to just use Ganesha's MDCACHE, but stat
request got exponentially slower at high request rates. I'm guessing that's
a bug that can be fixed, but I haven't gotten around to root causing the
issue yet.
Well, either way feel free to send the patches to gerrit, we won't bite.
Our wiki is a bit of a mess but you should have the minimum required
informatinos here:
https://github.com/nfs-ganesha/nfs-ganesha/wiki/DevPolicy#Pushing_to_gerrit
> I'm based in Sydney, Australia, so it's tricky for me to join the weekly
> call, which is ~2.30am my time. I'm currently in the US and plan to join
> the weekly call on Tuesday.
I'm boycotting the call as well when it's just half past midnight here
(JST), we can find other ways to work :)
In particular there often are people in #ganesha on freenode if you have
any question.
Excellent. I'll send some patches soon and join the discussion on the irc
channel.
Thanks,
Bjorn