Thanks Daniel and Jeff for detailed explanation. There are few more parameters that I am not sure about implementation. For time being, I have written function to encode layout and device id in file FSAL/common_pnfs.c (right now just passing random values for missing parameters). [1]

1.  ffds_stateid [0].
     Its different from layout stateid. For loosely coupled setup, it is anonymous and for tightly coupled its global stateid. In case of tightly coupled setup, is the global stateid created by nfs, or mds? In case we have tightly coupled setup, then we need some way to communicate it with data servers. And in that case does it make sense to implement at cephfs side?

2. ffdv_version, ffdv_minorversion, ffdv_rsize, ffdv_wsize:
   These are the parameters set at the data server ganesha.conf. Currently there is no way for mds to find out the ds configuration. Unless we assume that we support only specific values. Not sure if there is a better way to do it.

[0] https://tools.ietf.org/html/draft-ietf-nfsv4-flex-files-19#page-19
[1] https://github.com/supriti/nfs-ganesha/commit/8555e070cb1a9b6726a61a6fe7d1cbf2a0aced11


------
Supriti Singh 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)

>>> Jeff Layton <jlayton@redhat.com> 07/03/18 2:18 PM >>>
On Fri, 2018-06-29 at 11:29 +0200, Supriti Singh wrote:
> Hello,
>
> I am looking at how to implement synthetic uid/gids for flex layout.
>
> From rfc section "Implementation Notes for Synthetic uids/gids": [0]
> " When the metadata server had a request to access a file, a SETATTR would be sent to the storage device to set the owner and group of the data file. The user and group might be selected in a round robin fashion from the range of available ids. Those ids would be sent back as ffds_user and ffds_group to the client. And it would present them as the RPC credentials to the storage device. When the client was done accessing the file and the metadata server knew that no other client was accessing the file, it
> could reset the owner and group to restrict access to the data file."
>
> Few questions regarding implementation:
> 1. To implement it in nfs-ganesha, we would have to add a way to generate uids/gids, store a mapping between uids/gids and the corresponding data file. Is there already such structure in nfs-ganesha that can be reused?
>

One idea: stick the synthetic uid/gid into xattrs attached to the
backing objects for each inode.

The DS READ/WRITE calls would go to a NFS server running on each OSD.
That then gets converted into a RADOS op. The one for a WRITE would do
something like:

    rados_write_op_create();
    
    /* Compare the uid vs the uid xattr */
    rados_write_opcmpxattr("rpc uid");

    /* Do the write */
    rados_write_op_write(...);

The READ -> read op conversion would look similar.

Before handing out a layout stateid, we'd have to go through and check
and/or set the xattrs on all of the objects in the layout. That could
conceivably be done by ganesha without bothering the Ceph MDS.

That said, we may need to consider having the Ceph MDS be final arbiter
of these values, which may help prevent races in clustered MDS
configurations? I haven't given this enough thought yet.

> 2. As I understood, once metadata server generates uid/gid, it needs to send a setattr to the storage device to set owner and group of data file. But I am not sure how does this synthetic uid/gid maps to real uid/gid that a data file might have. The same data file in cephfs can be opened through clients other than nfs as well.
>

Ganesha shouldn't need to send a NFS SETATTR to a ceph cluster. This can
be done with the OSD protocol on the backend. Basically:

1) client does LAYOUTGET

2) Determine the location of the different layout segments

3) issue RADOS write ops to all of the objects to change their xattrs.
(basically do a cmpxattr and then a setxattr in the same operation to
set it)

4) build LAYOUTGET reply and send it back to client

To fence, we'd just need to go through and reset all of the xattrs first
(incrementing them is the usual thing to do) before allowing a
conflicting operation to proceed.

> [0] https://tools.ietf.org/html/draft-ietf-nfsv4-flex-files-19#page-3
>
>
>
> ------
> Supriti Singh
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org

--
Jeff Layton <jlayton@redhat.com>
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org