[Nfs-Ganesha-Support] Re: [Gluster-users] ganesha.nfsd process dies when copying files

Wednesday, 15 August 2018

On Wed, 2018-08-15 at 13:42 +0800, Pui Edylie wrote:
...
 Hi Karli,

 I think Alex is right in regards with the NFS version and state.

 I am only using NFSv3 and the failover is working per expectation. 
OK, so I've remade the test again and it goes like this:

1) Start copy loop[*]
2) Power off hv02
3) Copy loop stalls indefinitely

I have attached a snippet of the ctdb log that looks interesting but
doesn't say much to me execpt that something's wrong:)

[*]: while true; do mount -o vers=3 hv03v.localdomain:/data /mnt/; dd
if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress; rm -fv
/mnt/test.bin; umount /mnt; done

Thanks in advance!

/K

...

 In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x 
 gluster VM on each of the ESXI host using its local datastore.

 Once I have formed the replicate 3, I use the CTDB VIP to present the
 NFS3 back to the Vcenter and uses it as a shared storage.

 Everything works great other than performance is not very good ... I
 am still looking for ways to improve it.

 Cheers,
 Edy

 On 8/15/2018 12:25 AM, Alex Chekholko wrote:
 > Hi Karli,
 > 
 > I'm not 100% sure this is related, but when I set up my ZFS NFS HA
 > per https://github.com/ewwhite/zfs-ha/wiki I was not able to get
 > the failover to work with NFS v4 but only with NFS v3.
 > 
 > From the client point of view, it really looked like with NFS v4
 > there is an open file handle and that just goes stale and hangs, or
 > something like that, whereas with NFSv3 the client retries and
 > recovers and continues.  I did not investigate further, I just use
 > v3.  I think it has something to do with NFSv4 being "stateful" and
 > NFSv3 being "stateless".
 > 
 > Can you re-run your test but using NFSv3 on the client mount?  Or
 > do you need to use v4.x?
 > 
 > Regards,
 > Alex
 > 
 > On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg <karli(a)inparadise.se&gt;
 > wrote:
 > > On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
 > > > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
 > > > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
 > > > > > Hi Karli,
 > > > > > 
 > > > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
 > > > > > 
 > > > > > I just installed them last weekend ... they are working
 > > very well
 > > > > > :)
 > > > > 
 > > > > Okay, awesome!
 > > > > 
 > > > > Is there any documentation on how to do that?
 > > > > 
 > > > 
 > > > https://github.com/gluster/storhaug/wiki
 > > > 
 > > 
 > > Thanks Kaleb and Edy!
 > > 
 > > I have now redone the cluster using the latest and greatest
 > > following
 > > the above guide and repeated the same test I was doing before
 > > (the
 > > rsync while loop) with success. I let (forgot) it run for about a
 > > day
 > > and it was still chugging along nicely when I aborted it, so
 > > success
 > > there!
 > > 
 > > On to the next test; the catastrophic failure test- where one of
 > > the
 > > servers dies, I'm having a more difficult time with.
 > > 
 > > 1) I start with mounting the share over NFS 4.1 and then proceed
 > > with
 > > writing a 8 GiB large random data file with 'dd', while "hard-
 > > cutting"
 > > the power to the server I'm writing to, the transfer just stops
 > > indefinitely, until the server comes back again. Is that supposed
 > > to
 > > happen? Like this:
 > > 
 > > # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
 > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
 > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
 > > 2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
 > > 
 > > (here I cut the power and let it be for almost two hours before
 > > turning
 > > it on again)
 > > 
 > > dd: error writing '/mnt/test.bin': Remote I/O error
 > > 2325+0 records in
 > > 2324+0 records out
 > > 2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
 > > # umount /mnt
 > > 
 > > Here the unmount command hung and I had to hard reset the client.
 > > 
 > > 2) Another question I have is why some files "change" as you copy
 > > them
 > > out to the Gluster storage? Is that the way it should be? This
 > > time, I
 > > deleted eveything in the destination directory to start over:
 > > 
 > > # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
 > > # rm -f /mnt/test.bin
 > > # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
 > > 8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
 > > 8192+0 records in
 > > 8192+0 records out
 > > 8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
 > > # md5sum /var/tmp/test.bin 
 > > 073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
 > > # md5sum /mnt/test.bin 
 > > 634187d367f856f3f5fb31846f796397  /mnt/test.bin
 > > # umount /mnt
 > > 
 > > Thanks in advance!
 > > 
 > > /K
 > > _______________________________________________
 > > Gluster-users mailing list
 > > Gluster-users(a)gluster.org
 > > https://lists.gluster.org/mailman/listinfo/gluster-users

2025

2024

2023

2022

2021

2020

2019

2018

[Nfs-Ganesha-Support] Re: [Gluster-users] ganesha.nfsd process dies when copying files