On 05/17/2018 10:04 AM, Supriti Singh wrote:
Hello all,
I have been investigating on how to implement pnfs for FSAL CEPH [1]. As
suggested by Jeff , I spent time reading about flex file layout and
wrote a POC just for my own understanding of pnfs protocol [2]. But as
nfs-ganesha lacks support for flex filelayout, just to ensure if
libcephfs api work, I wrote existing code using file layout. I have few
doubts and would also like to understand whats the best way to breakdown
the work here.
1. Advantage of using flex over file layout:
As I understood, the advantage is that in flex file layout storage
nodes can be nfs version 3 or 4. Hence, we can make the data path more
simpler. Is there any other advantage wrt how ceph stores data (crush
maps) that makes flex better than file layout.
My understanding is that files layout only does periodic layouts (that
is, repeating patterns, like striping). Crush is not a repeating
pattern, it's a semi-random pattern, so it cannot be represented in
files layout.
2. Implementing Flex file layout support
In existing code we have a file "*flex_files_prot.x* " but that does
not match the latest rfc [1]. There are some other data structures
defined in nfsv41.h, but we would require to implement the functions. I
will submit a patch.
We haven't use the .x files in a very long time (and we should probably
remove them). You'll have to update the .h and .c files.
3. While working on POC, there were some implementation issue as well:
- In LAYOUTGET, we populate opaque object device_id (16 bytes) and in
GETDEVICEINFO, we should be able to send the network address for the
storage device. To get started, I assumed we will co-locate the osds
with storage device. And in GETDEVICEINFO, we use api ceph_ll_osdaddr to
get the osd address. We will need some way to encode the osd_ids in the
device_id and pass it to GETDEVICEINFO. Its a implementation detail, but
just want to be sure I am not overlooking something obvious
This sounds correct.
- In LAYOUTGET, using ceph_ll_get_stripe_osd to get the osd
corresponding to stripe. But I see a assert failure at [3], and I have
to investigate if its because of my ceph setup or bug in the code. And
as there is no documentation for this function, I am not sure exactly
what is it doing.
I don't believe the layout/pNFS code in ceph works. It has been unused
for years, and I'm not sure it ever fully worked, but was more of a
proof-of-concept.
Daniel