> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote:
>> On 2018/9/11 20:57, Trond Myklebust wrote:
>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote:
>>>> The latest ganesha/gluster supports seek according to,
>>>>
>>>>
>>>
>>>
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sectio
>>> n-15.11
>>>>
>>>> From the given sa_offset, find the next data_content4 of type
>>>> sa_what
>>>> in the file. If the server can not find a corresponding sa_what,
>>>> then the status will still be NFS4_OK, but sr_eof would be
>>>> TRUE. If
>>>> the server can find the sa_what, then the sr_offset is the
>>>> start of
>>>> that content. If the sa_offset is beyond the end of the file,
>>>> then
>>>> SEEK MUST return NFS4ERR_NXIO.
>>>>
>>>> For a file's filemap as,
>>>>
>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000
>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000
>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>>
>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1,
>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, SEEK_HOLE)
>>>> gets result (sr_eof:0, sr_offset:0x70000) from ganesha/gluster.
>>>>
>>>> If an application depends the lseek result for data searching, it
>>>> may enter infinite loop.
>>>>
>>>> while (1) {
>>>> next_pos = lseek(fd, cur_pos, seek_type);
>>>> if (seek_type == SEEK_DATA) {
>>>> seek_type = SEEK_HOLE;
>>>> } else {
>>>> seek_type = SEEK_DATA;
>>>> }
>>>>
>>>> if (next_pos == -1) {
>>>> return ;
>>>>
>>>> cur_pos = next_pos;
>>>> }
>>>>
>>>> The lseek syscall always gets 0x70000 from nfs client for those
>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the nfs
>>>> server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO.
>>>>
>>>> I wanna to know,
>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the
>>>> first case?
>>>> or should I fix the nfs client to return ENXIO for the first case?
>>>>
>>>
>>> It that test correct? The fallback implementation of SEEK_DATA
>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any
>>> offset that is <= eof will be a no-op. The fallback implementation
>>> of SEEK_HOLE assumes that the first hole is at eof.
>>
>> I think that's non-nfsv4.2's logical.
>>
>>>
>>> IOW: unless the initial value for cur_pos is > eof, it looks to me
>>> as if the above test will loop infinitely given any filesystem that
>>> doesn't implement native support for SEEK_DATA/SEEK_HOLE.
>>>
>>
>> No, if underlying filesystem is ext4/f2fs, or the nfs server is knfsd,
>> the last lseek syscall always return ENXIO no matter the cur_pos is >
>> eof or not.
>>
>> A file ends with a hole as,
>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff
>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff
>>
>> # stat testfile
>> File: testfile
>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file
>> Device: 807h/2055d Inode: 843122 Links: 2
>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
>> Access: 2018-09-11 20:01:41.881227061 +0800
>> Modify: 2018-09-11 20:01:41.976478311 +0800
>> Change: 2018-09-11 20:01:41.976478311 +0800
>> Birth: -
>>
>> # strace filemap testfile
>> ... ...
>> lseek(3, 111149056, SEEK_HOLE) = 112197632
>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or address)
>>
>> Right now, when knfsd gets the ENXIO error, it returns the error to
>> client directly, and return to syscall.
>> But, ganesha set the sr_eof to true and return NFS4_OK to client as
>> RFC description, nfs client skips the sr_eof and return a bad offset
>> to syscall.
>
> Would it make more sense to change Knfsd instead of the client? I think I was
> trying to keep things simple when I wrote the code, so I just passed the result of
> the lseek system call back to the client.
Looking at the lseek(2) man page, it's not clear to me what should be returned if as
in this example, there is a HOLE at the end of the file (i.e. filesize is larger than the
range of the last DATA in the file). It sounds like ext4 returns ENXIO if a SEEK_DATA is
done past the last data in the file.
SEEK_DATA
Adjust the file offset to the next location in the file greater than or
equal to offset containing data. If offset points to data, then the file offset is set
to offset.
SEEK_HOLE
Adjust the file offset to the next hole in the file greater than or equal
to offset. If offset points into the middle of a hole, then the file offset is set to
offset. If there is no hole past offset, then the file offset is adjusted
to the end of the file (i.e., there is an implicit hole at the end of any file).
In both of the above cases, lseek() fails if offset points past the end of the
file.
These operations allow applications to map holes in a sparsely allocated file.
This can be useful for applications such as file backup tools, which can save space when
creating backups and preserve holes, if they have a mechanism for discovering
holes.
For the purposes of these operations, a hole is a sequence of zeros that
(normally) has not been allocated in the underlying file storage. However, a filesystem
is not
obliged to report holes, so these operations are not a guaranteed mechanism for
mapping the storage space actually allocated to a file. (Furthermore, a sequence of
zeros that actually has been written to the underlying storage may not be
reported as a hole.) In the simplest implementation, a filesystem can support the
operations
by making SEEK_HOLE always return the offset of the end of the file, and making
SEEK_DATA always return offset (i.e., even if the location referred to by offset is a
hole, it can be considered to consist of data that is a sequence of zeros).
The RFC text is pretty clear:
SEEK is an operation that allows a client to determine the location
of the next data_content4 in a file. It allows an implementation of
the emerging extension to lseek(2) to allow clients to determine the
next hole whilst in data or the next data whilst in a hole.
From the given sa_offset, find the next data_content4 of type sa_what
in the file. If the server can not find a corresponding sa_what,
then the status will still be NFS4_OK, but sr_eof would be TRUE. If
the server can find the sa_what, then the sr_offset is the start of
that content. If the sa_offset is beyond the end of the file, then
SEEK MUST return NFS4ERR_NXIO.
All files MUST have a virtual hole at the end of the file. I.e., if
a filesystem does not support sparse files, then a compound with
{SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;}
where 'X' was the size of the file.
Sa_offset is not past the end of the file, but there is no more DATA, so a seek DATA from
0x70000 (original file) should return sr_eof TRUE.
In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000.
It certainly makes sense that you should be able to have a hole at the end of a file
(pre-allocated disk blocks but no data written yet), and is in fact what fallocate(2) will
do.
An NFS server could check the filesize and if sa_offset is < filesize and a SEEK_DATA
returns ENXIO, it could translate that to NFS4_OK and set sr_eof to TRUE.
The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any ENXIO result to
NFS4_OK with sr_eof TRUE. It would be better for it to do the simple thing knfsd does of
always passing along the ENXIO (this may be best if it is not possible to safely verify
sa_offset really is < filesize).
Do you mean modifying ganesha/gluster as knfsd does?
seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence);
if (seek->seek_pos < 0)
status = nfserrno(seek->seek_pos);
else if (seek->seek_pos >= i_size_read(file_inode(file)))
seek->seek_eof = true;
It is a working implementation, but not according to RFC description,
If the server can not find a corresponding sa_what,
then the status will still be NFS4_OK, but sr_eof would be TRUE.
As in this example, there is a HOLE at the end of the file,
SEEK(in hole, SEEK_DATA) should return NFS4_OK and sr_eof is TRUE,
but knfsd return NFS4ERR_NXIO.
thanks,
Kinglong Mee