Deploying an Active/Active NFS Cluster over CephFS - Support - Nfs Ganesha List Archives

2025

2024

2023

2022

2021

2020

2019

2018

Deploying an Active/Active NFS Cluster over CephFS

ganesha coredump in...

Deploying an Active/Active NFS...

Jeff Layton

Wednesday, 12 December 2018 Wed, 12 Dec '18

6:13 a.m.

(Sorry for the duplicate email to ganesha lists, but I wanted to widen it to include the ceph lists) In response to some cries for help over IRC, I wrote up this blog post the other day, which discusses how to set up parallel serving over CephFS: https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-... Feel free to comment if you have questions. We may be want to eventually turn this into a document in the ganesha or ceph trees as well. Cheers! -- Jeff Layton <jlayton(a)redhat.com>

Reply

Show replies by date

David C

Wednesday, 12 December Wed, 12 Dec

7:35 a.m.

Hi Jeff Many thanks for this! Looking forward to testing it out. Could you elaborate a bit on why Nautilus is recommended for this set-up please. Would attempting this with a Luminous cluster be a non-starter? On Wed, 12 Dec 2018, 12:16 Jeff Layton <jlayton(a)redhat.com wrote:

(Sorry for the duplicate email to ganesha lists, but I wanted to widen it to include the ceph lists) In response to some cries for help over IRC, I wrote up this blog post the other day, which discusses how to set up parallel serving over CephFS: https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-... Feel free to comment if you have questions. We may be want to eventually turn this into a document in the ganesha or ceph trees as well. Cheers! -- Jeff Layton <jlayton(a)redhat.com>

Reply

Jeff Layton

8:06 a.m.

The main thing is that luminous does not have some of the new recovery interfaces in nautilus: A ganesha server acts as a Ceph client and any state held by it (opens, locks, etc) is tied to the ceph client session. When a ganesha restarts, the new instance of the server will need to reacquire some subset of the caps it held before, but to the Ceph MDS, it just looks like another client. So, we end up having to wait around until the old session times out before we can acquire some of those caps. That timeout is around 60s and that can eat heavily into the NFS grace period (which is usually about 90-120s). During that time, stateful operations performed by the clients (opens, locks, etc.) will stall. In nautilus we've added a way to tag a session with a particular unique ID so that when the server is resurrected it can ask the MDS to cancel the old session immediately. That allows us to get back to business quicker. Luminous may work but (at best) you'll end up with longer recovery times. At worst, you could end up with NFS state recovery failures if the interlocking timeouts didn't work out cleanly. -- Jeff On Wed, 2018-12-12 at 13:35 +0000, David C wrote:

Hi Jeff Many thanks for this! Looking forward to testing it out. Could you elaborate a bit on why Nautilus is recommended for this set-up please. Would attempting this with a Luminous cluster be a non-starter? On Wed, 12 Dec 2018, 12:16 Jeff Layton <jlayton(a)redhat.com wrote: > (Sorry for the duplicate email to ganesha lists, but I wanted to widen > it to include the ceph lists) > > In response to some cries for help over IRC, I wrote up this blog post > the other day, which discusses how to set up parallel serving over > CephFS: > > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-... > > Feel free to comment if you have questions. We may be want to eventually > turn this into a document in the ganesha or ceph trees as well. > > Cheers!

-- Jeff Layton <jlayton(a)redhat.com>

Reply

2447

days inactive

2447

days old

support@lists.nfs-ganesha.org

Manage subscription

2 comments

2 participants

tags (0)

participants (2)

David C
Jeff Layton