Hi, I have started using Ganesha a bit more intensely than usual
under Ubuntu LTS 14 (kernel 4.4.0-nnn) and 16 (kernel 4.15-nn)
and I was astonished to see that 'rm -r' of a directory
hierarchy returned that some files were not deleted, and 1-2
repeats were needed (mounted with NFSv4, TCP, 'sec=sys').
Investigating it I saw that 'ls' of a directory often (usually)
did not return all entries in a directory. Indeed using 'strace'
both 'rm' and 'ls' showed missing entries from 'getdents'(2).
Looking at packet captures shows that this happens to READDIR
responses that are split in several parts, and the entry omitted
is always the last one in each part. But sometimes it is not
omitted by the client. This omission never happens with the
'nfs-kernel-server'. The omission is highly repeatable for a
given directory and client, but the omitted entries depend on
the client and the number of entries in a directory. On one
client for example it was consistently every 40th entry (because
each part contained 40 entries).
No entries are omitted when the client is Fedora 30 with kernel
5.2.nn or with GRML 2018.12 or Parrot 4.7 (kernel 4.19.nn in
both cases).
My investigations, with the good assistance of a couple of
people from the #Ganesha IRC channel, seem to show that this is
a bug in the Ubuntu kernel 'nfs' kernel filesystem, but since
'nfs-kernel-server' does not trigger it, it gives the impression
that it is Ganesha that is unreliable, so a workaround within
Ganesha might be useful.
I have made a TCP trace here of a 'ls' of a directory with 90
entries:
http://www.sabi.co.uk/xtmp/nfs-ganesha-ext4-miss_32_51_70.pcap
On one client the missing entries are "0032" (packet 82), "0051"
(packet 87) and "0070" (packet 92), all of which are the last
entry in their part of the response; curiously entries "0013"
and "0088" the last entry their parts of the response but are
listed by the 'nfs' client.
I am not familiar enough with the NFS v4 protocol to reliably
spot any differences that might trigger that bug, but I had a
look and I cannot see any.
BTW I had a look at potentially relevant commits from the kernel
'git' and no obvious point, but these commits may be relevant:
02ef04e432babf8fc703104212314e54112ecd2d
98de9ce6f6660d02aa72d7b9b17696fa68a2ed9b