CEPH RGW and VMWare vSphere 6.7 mdcache_find_keyed_reason
by Martin G
Dear all,
I ran into trouble connecting VMWare 6.7 to CEPH cluster exported via RGW and Ganesha. Once the datastore is created, the ganesha thread is stuck at 100% CPU in mdcache_find_keyed_reason, preceeded by pthread mutex lock and unlock.
I am at Nautilus, deployed by Ansible, running on latest CentOS 7. The issue appeared in Ganesha 2.7.3 and also in 2.8.
Stacktrace from perf top here: https://imgur.com/a/rQ3AKfy
I would greatly appreciate any clues or hints on what causes this.
Thanks,
Martin
5 years, 1 month
nfs-ganesha freeze after a few hours
by andre.roberge@maskicom.net
Hi,
I’m running a nfs-ganesha on a virtual server with SR-IOV(Intel XP520) which job is basically recording 167 channels of TV feed for a time shift function. The set-up would run for hours without any issues than the VM would just freeze with no apparent reason.
My Ceph cluster is running fine and are barely using the available storage the load on the server is minimal.
There is no error on the Ceph side the nfs-ganesha virtual machine just freeze.
The nfs-ganesha server is set-up with BGP to the host using frrouting 7 into leaf/spine typologies which is working fine.
Here is my mount for the NFS
10.70.0.67:/cephfs/timeshift1 /mnt/cephfs nfs4 noatime,soft,nfsvers=4.1,async,proto=tcp 0 0
Here is my config for Ganesha
NFS_Core_Param
{
}
EXPORT_DEFAULTS {
Attr_Expiration_Time = 0;
}
CACHEINODE {
Dir_Chunk = 0;
NParts = 1;
Cache_Size = 1;
}
EXPORT
{
Export_id=20133;
Path = "/";
Pseudo = /cephfs;
Access_Type = RW;
Protocols = 3,4;
Transports = TCP;
SecType = sys,krb5,krb5i,krb5p;
Squash = No_Root_Squash;
Attr_Expiration_Time = 0;
FSAL {
Name = CEPH;
User_Id = "admin";
}
}
EXPORT
{
Export_id=20134;
Path = "/";
Pseudo = /cephobject;
Access_Type = RW;
Protocols = 3,4;
Transports = TCP;
SecType = sys,krb5,krb5i,krb5p;
Squash = Root_Squash;
FSAL {
Name = RGW;
User_Id = "cephnfs";
Access_Key_Id ="5XC5JJPHT1TVF7COSH23";
Secret_Access_Key = "6347daPBi79srlE3Kw6l4zDA8SMMkJQZjA1ug7LK";
}
}
RGW {
ceph_conf = "/etc/ceph/ceph.conf";
cluster = "ceph";
name = "client.rgw.cephctl1";
}
LOG {
Facility {
name = FILE;
destination = "/var/log/ganesha/ganesha.log";
enable = active;
}
}
Here is my Ceph status at the freeze time.
data:
pools: 7 pools, 172 pgs
objects: 1.43M objects, 3.7 TiB
usage: 11 TiB used, 192 TiB / 204 TiB avail
pgs: 172 active+clean
Ceph version
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
Host Version Info
CentOS 7.2 Kernel 5.3.2-1.el7
Also, tried those kernels also with the same issue.
CentOS Linux (5.3.2-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-1062.4.1.el7.x86_64) 7 (Core)
CentOS Linux (3.10.0-1062.1.2.el7.x86_64) 7 (Core)
Ganesha Version info :
nfs-ganesha-ceph 2.8.2
nfs-ganesha-rgw 2.8.2
libcephfs2 14.2.4
Any guidance on how to resolve this issue would be appreciated
Andre Roberge
andre.roberge(a)maskicom.net
5 years, 1 month
Nfs-ganesha ntirpc crash
by David C
Hi All
I've hit a segfault I've not seen before, seems related to ntirpc, please
see backtrace:
(gdb) bt
#0 xdr_putenum (enumv=<error reading variable: Cannot access memory at
address 0x0>, xdrs=0x7fd831761490) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/ntirpc/rpc/xdr.h:584
#1 xdr_enum (xdrs=0x7fd831761490, ep=0x0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/ntirpc/rpc/xdr_inline.h:405
#2 0x0000000000456749 in xdr_nfs_opnum4 (objp=0x0, xdrs=0x7fd831761490) at
/usr/src/debug/nfs-ganesha-2.7.3/include/nfsv41.h:8065
#3 xdr_nfs_resop4 (xdrs=0x7fd831761490, objp=0x0) at
/usr/src/debug/nfs-ganesha-2.7.3/include/nfsv41.h:8433
#4 0x0000000000458afe in xdr_array_encode (cpp=<optimized out>,
sizep=<optimized out>, xdr_elem=0x456730 <xdr_nfs_resop4>, selem=160,
maxsize=1024, xdrs=0x7fd831761490) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/ntirpc/rpc/xdr_inline.h:851
#5 xdr_array (xdr_elem=0x456730 <xdr_nfs_resop4>, selem=160, maxsize=1024,
sizep=<optimized out>, cpp=<optimized out>, xdrs=0x7fd831761490) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/ntirpc/rpc/xdr_inline.h:894
#6 xdr_COMPOUND4res (xdrs=0x7fd831761490, objp=<optimized out>) at
/usr/src/debug/nfs-ganesha-2.7.3/include/nfsv41.h:8779
#7 0x00007fdc0cd0f89b in svc_vc_reply (req=0x7fd831777d30) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_vc.c:887
#8 0x0000000000451337 in nfs_rpc_process_request (reqdata=0x7fd831777d30)
at /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1384
#9 0x0000000000450766 in nfs_rpc_decode_request (xprt=0x7fdb1c00a0d0,
xdrs=0x7fd831f6e190) at
/usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#10 0x00007fdc0cd0d07d in svc_rqst_xprt_task (wpe=0x7fdb1c00a2e8) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
#11 0x00007fdc0cd0d59a in svc_rqst_epoll_events (n_events=<optimized out>,
sr_rec=0x53136a0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
#12 svc_rqst_epoll_loop (sr_rec=<optimized out>) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1014
#13 svc_rqst_run_task (wpe=0x53136a0) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1050
#14 0x00007fdc0cd15123 in work_pool_thread (arg=0x7fd86000a960) at
/usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/work_pool.c:181
#15 0x00007fdc0b2cddd5 in start_thread (arg=0x7fdafffff700) at
pthread_create.c:307
#16 0x00007fdc0a444ead in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Mem usage on the server was quite high at the time so wonder if that's
related?
nfs-ganesha-2.7.3-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.3-0.1.el7.x86_64
libcephfs2-14.2.1-0.el7.x86_64
librados2-14.2.1-0.el7.x86_64
Thanks,
5 years, 1 month
Re: NFS export of ZFS filesystems
by Kaleb Keithley
Sending it only to me makes it hard for other people to see your response.
;-) cc: support(a)lists.nfs-ganesha.org
There's no FSAL_ZFS in the ganesha sources. There was a FSAL_ZFS in
ganesha-2.5, but it was removed in 2.6; the FSAL API had diverged and
nobody was maintaining it. (And Ubuntu 10.04? That's seriously old. The
oldest Ubuntu that we package for in the gluster or ganesha PPAs is 16.04.)
Off hand I'd say try FSAL_VFS and set the Path to your zfs fs that's backed
by the zpool.
On Mon, Oct 14, 2019 at 8:41 AM John Hearns <john(a)kheironmed.com> wrote:
> A ZFS filesystem which I wish to export is /data1/fiotest
> ganesha.conf reads
> I note already that I have the pool name wrong...
>
>
> EXPORT
> {
> # Export Id (mandatory, each EXPORT must have a unique Export_Id)
> Export_Id = 77;
>
> # Exported path (mandatory)
> Path = /data1/fiotest;
>
> # Pseudo Path (required for NFS v4)
> Pseudo = /data1/fiotest;
>
> # Required for access (default is None)
> # Could use CLIENT blocks instead
> Access_Type = RO;
>
> # Exporting FSAL
> FSAL {
> Name = VFS;
> }
> # Exporting ZFS
> ZFS {
> # Zpool to use
> zpool = "pool1";
> }
> }
>
> On Mon, 14 Oct 2019 at 13:35, Kaleb Keithley <kkeithle(a)redhat.com> wrote:
>
>>
>>
>> On Mon, Oct 14, 2019 at 6:24 AM <john(a)kheironmed.com> wrote:
>>
>>> OS - Ubuntu 10.04.3 LTS
>>> ZFS 0.7.5-1ubuntu16.6
>>> Ganesha version 2.6.0.2
>>>
>>> I am not using Gluster. Teh documentation says that if I set sharenfs=on
>>> for a given zfs filesystem it should be exported by NFS
>>>
>>
>> By knfs! That's what I would presume.
>>
>>
>>> However I do nto see this happening - can someone help me to set up
>>> /etc/ganesha/ganesha.conf in this mix of OS and ZFS?
>>>
>>
>> I don't believe anyone here has much experience with ZFS.
>>
>> You should be using FSAL_VFS, with Path and Pseudo set accordingly for
>> your ZFS fs. Paste your ganesha.conf somewhere where we can review it.
>>
>> --
>>
>> Kaleb
>>
>>
>
> *Kheiron Medical Technologies*
>
> kheironmed.com | supporting radiologists with deep learning
>
> Kheiron Medical Technologies Ltd. is a registered company in England and
> Wales. This e-mail and its attachment(s) are intended for the above named
> only and are confidential. If they have come to you in error then you must
> take no action based upon them but contact us immediately. Any disclosure,
> copying, distribution or any action taken or omitted to be taken in
> reliance on it is prohibited and may be unlawful. Although this e-mail and
> its attachments are believed to be free of any virus, it is the
> responsibility of the recipient to ensure that they are virus free. If you
> contact us by e-mail then we will store your name and address to facilitate
> communications. Any statements contained herein are those of the individual
> and not the organisation.
>
> Registered number: 10184103. Registered office: 2nd Floor Stylus
> Building, 116 Old Street, London, England, EC1V 9BG
>
5 years, 2 months
NFS export of ZFS filesystems
by john@kheironmed.com
OS - Ubuntu 10.04.3 LTS
ZFS 0.7.5-1ubuntu16.6
Ganesha version 2.6.0.2
I am not using Gluster. Teh documentation says that if I set sharenfs=on for a given zfs filesystem it should be exported by NFS
However I do nto see this happening - can someone help me to set up /etc/ganesha/ganesha.conf in this mix of OS and ZFS?
Also I note ubuntu16.6 in the ZFS version ....
5 years, 2 months
Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing
by Daniel Gryniewicz
Client::fill_statx() is a fairly large function, so it's hard to know
what's causing the crash. Can you get line numbers from your backtrace?
Daniel
On 10/7/19 9:59 AM, David C wrote:
> Hi All
>
> Further to my previous messages, I upgraded
> to libcephfs2-14.2.2-0.el7.x86_64 as suggested and things certainly seem
> a lot more stable, I have had some crashes though, could someone assist
> in debugging this latest crash please?
>
> (gdb) bt
> #0 0x00007fce4e9fc1bb in Client::fill_statx(Inode*, unsigned int,
> ceph_statx*) () from /lib64/libcephfs.so.2
> #1 0x00007fce4ea1d4ca in Client::_readdir_cache_cb(dir_result_t*, int
> (*)(void*, dirent*, ceph_statx*, long, Inode*), void*, int, bool) ()
> from /lib64/libcephfs.so.2
> #2 0x00007fce4ea1e865 in Client::readdir_r_cb(dir_result_t*, int
> (*)(void*, dirent*, ceph_statx*, long, Inode*), void*, unsigned int,
> unsigned int, bool) () from /lib64/libcephfs.so.2
> #3 0x00007fce4ea1f3dd in Client::readdirplus_r(dir_result_t*, dirent*,
> ceph_statx*, unsigned int, unsigned int, Inode**) () from
> /lib64/libcephfs.so.2
> #4 0x00007fce4ece7b0e in fsal_ceph_readdirplus (dir=<optimized out>,
> cred=<optimized out>, out=0x7fccdbefa720, flags=0, want=1775,
> stx=0x7fccdbefa730, de=0x7fccdbefa8c0, dirp=<optimized out>,
> cmount=<optimized out>)
> at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/statx_compat.h:314
> #5 ceph_fsal_readdir (dir_pub=<optimized out>, whence=<optimized out>,
> dir_state=0x7fccdbefaa30, cb=0x522640 <mdc_readdir_uncached_cb>,
> attrmask=122830, eof=0x7fccdbefac0b) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/handle.c:211
> #6 0x00000000005256e1 in mdcache_readdir_uncached
> (directory=directory@entry=0x7fcaa8bb84a0, whence=<optimized out>,
> dir_state=<optimized out>, cb=<optimized out>, attrmask=<optimized out>,
> eod_met=<optimized out>)
> at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1654
> #7 0x0000000000517a88 in mdcache_readdir (dir_hdl=0x7fcaa8bb84d8,
> whence=0x7fccdbefab18, dir_state=0x7fccdbefab30, cb=0x432db0
> <populate_dirent>, attrmask=122830, eod_met=0x7fccdbefac0b) at
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:551
> #8 0x000000000043434a in fsal_readdir
> (directory=directory@entry=0x7fcaa8bb84d8, cookie=cookie@entry=0,
> nbfound=nbfound@entry=0x7fccdbefac0c,
> eod_met=eod_met@entry=0x7fccdbefac0b, attrmask=122830,
> cb=cb@entry=0x46f600 <nfs4_readdir_callback>,
> opaque=opaque@entry=0x7fccdbefac20)
> at /usr/src/debug/nfs-ganesha-2.7.3/FSAL/fsal_helper.c:1164
> #9 0x00000000004705b9 in nfs4_op_readdir (op=0x7fcb7fed1f80,
> data=0x7fccdbefaea0, resp=0x7fcb7d106c40) at
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_readdir.c:664
> #10 0x000000000045d120 in nfs4_Compound (arg=<optimized out>,
> req=<optimized out>, res=0x7fcb7e001000) at
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
> #11 0x00000000004512cd in nfs_rpc_process_request
> (reqdata=0x7fcb7e1d1950) at
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
> #12 0x0000000000450766 in nfs_rpc_decode_request (xprt=0x7fcaf17fb0e0,
> xdrs=0x7fcb7e1ddb90) at
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
> #13 0x00007fce6165707d in svc_rqst_xprt_task (wpe=0x7fcaf17fb2f8) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
> #14 0x00007fce6165759a in svc_rqst_epoll_events (n_events=<optimized
> out>, sr_rec=0x56a24c0) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
> #15 svc_rqst_epoll_loop (sr_rec=<optimized out>) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1014
> #16 svc_rqst_run_task (wpe=0x56a24c0) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1050
> #17 0x00007fce6165f123 in work_pool_thread (arg=0x7fcd381c77b0) at
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/work_pool.c:181
> #18 0x00007fce5fc17dd5 in start_thread () from /lib64/libpthread.so.0
> #19 0x00007fce5ed8eead in clone () from /lib64/libc.so.6
>
> Package versions:
>
> nfs-ganesha-vfs-2.7.3-0.1.el7.x86_64
> nfs-ganesha-debuginfo-2.7.3-0.1.el7.x86_64
> nfs-ganesha-ceph-2.7.3-0.1.el7.x86_64
> nfs-ganesha-2.7.3-0.1.el7.x86_64
> libcephfs2-14.2.2-0.el7.x86_64
> librados2-14.2.2-0.el7.x86_64
>
> Ganesha export:
>
> EXPORT
> {
> Export_ID=100;
> Protocols = 4;
> Transports = TCP;
> Path = /;
> Pseudo = /ceph/;
> Access_Type = RW;
> Attr_Expiration_Time = 0;
> Disable_ACL = FALSE;
> Manage_Gids = TRUE;
> Filesystem_Id = 100.1;
> FSAL {
> Name = CEPH;
> }
> }
>
> Ceph.conf:
>
> [client]
> mon host = --removed--
> client_oc_size = 6291456000 #6GB
> client_acl_type=posix_acl
> client_quota = true
> client_quota_df = true
>
> Client mount options:
>
> rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=removed,local_lock=none,addr=removed)
>
> On Fri, Jul 19, 2019 at 5:47 PM David C <dcsysengineer(a)gmail.com
> <mailto:dcsysengineer@gmail.com>> wrote:
>
> Thanks, Jeff. I'll give 14.2.2 a go when it's released.
>
> On Wed, 17 Jul 2019, 22:29 Jeff Layton, <jlayton(a)poochiereds.net
> <mailto:jlayton@poochiereds.net>> wrote:
>
> Ahh, I just noticed you were running nautilus on the client
> side. This
> patch went into v14.2.2, so once you update to that you should
> be good
> to go.
>
> -- Jeff
>
> On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote:
> > This is almost certainly the same bug that is fixed here:
> >
> > https://github.com/ceph/ceph/pull/28324
> >
> > It should get backported soon-ish but I'm not sure which luminous
> > release it'll show up in.
> >
> > Cheers,
> > Jeff
> >
> > On Wed, 2019-07-17 at 10:36 +0100, David C wrote:
> > > Thanks for taking a look at this, Daniel. Below is the only
> interesting bit from the Ceph MDS log at the time of the crash
> but I suspect the slow requests are a result of the Ganesha
> crash rather than the cause of it. Copying the Ceph list in case
> anyone has any ideas.
> > >
> > > 2019-07-15 15:06:54.624007 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : 6 slow requests, 5 included
> below; oldest blocked for > 34.588509 secs
> > > 2019-07-15 15:06:54.624017 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 33.113514 seconds
> old, received at 2019-07-15 15:06:21.510423:
> client_request(client.16140784:5571174 setattr mtime=2019-07-15
> 14:59:45.642408 #0x10009079cfb 2019-07
> > > -15 14:59:45.642408 caller_uid=1161, caller_gid=1131{})
> currently failed to xlock, waiting
> > > 2019-07-15 15:06:54.624020 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.588509 seconds
> old, received at 2019-07-15 15:06:20.035428:
> client_request(client.16129440:1067288 create
> #0x1000907442e/filePathEditorRegistryPrefs.melDXAtss 201
> > > 9-07-15 14:59:53.694087 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624025 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.583918 seconds
> old, received at 2019-07-15 15:06:20.040019:
> client_request(client.16140784:5570551 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 14:59:44.171408 cal
> > > ler_uid=1161, caller_gid=1131{}) currently failed to
> rdlock, waiting
> > > 2019-07-15 15:06:54.624028 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.580632 seconds
> old, received at 2019-07-15 15:06:20.043305:
> client_request(client.16129440:1067293 unlink
> #0x1000907442e/filePathEditorRegistryPrefs.melcdzxxc 201
> > > 9-07-15 14:59:53.701964 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,35
> > > 22,3520,3523,}) currently failed to wrlock, waiting
> > > 2019-07-15 15:06:54.624032 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.538332 seconds
> old, received at 2019-07-15 15:06:20.085605:
> client_request(client.16129440:1067308 create
> #0x1000907442e/filePathEditorRegistryPrefs.melHHljMk 201
> > > 9-07-15 14:59:53.744266 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:55.014073 7f5fdcdc0700 1 mds.mds01
> Updating MDS map to version 68166 from mon.2
> > > 2019-07-15 15:06:59.624041 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : 7 slow requests, 2 included
> below; oldest blocked for > 39.588571 secs
> > > 2019-07-15 15:06:59.624048 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 30.495843 seconds
> old, received at 2019-07-15 15:06:29.128156:
> client_request(client.16129440:1072227 create
> #0x1000907442e/filePathEditorRegistryPrefs.mel58AQSv 2019-07-15
> 15:00:02.786754 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to wrlock, waiting
> > > 2019-07-15 15:06:59.624053 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 39.432848 seconds
> old, received at 2019-07-15 15:06:20.191151:
> client_request(client.16140784:5570649 mknod
> #0x1000907442e/filePathEditorRegistryPrefs.mel3HZLNE 2019-07-15
> 14:59:44.322408 caller_uid=1161, caller_gid=1131{}) currently
> failed to wrlock, waiting
> > > 2019-07-15 15:07:03.014108 7f5fdcdc0700 1 mds.mds01
> Updating MDS map to version 68167 from mon.2
> > > 2019-07-15 15:07:04.624096 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : 8 slow requests, 1 included
> below; oldest blocked for > 44.588632 secs
> > > 2019-07-15 15:07:04.624103 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.904077 seconds
> old, received at 2019-07-15 15:06:29.719983:
> client_request(client.16129440:1072228 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 15:00:03.378512 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to rdlock, waiting
> > > 2019-07-15 15:07:07.013972 7f5fdcdc0700 1 mds.mds01
> Updating MDS map to version 68168 from mon.2
> > > 2019-07-15 15:07:09.624166 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : 10 slow requests, 2 included
> below; oldest blocked for > 49.588693 secs
> > > 2019-07-15 15:07:09.624173 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 32.689838 seconds
> old, received at 2019-07-15 15:06:36.934283:
> client_request(client.16129440:1072271 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 15:00:10.592734 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to rdlock, waiting
> > > 2019-07-15 15:07:09.624177 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 34.962719 seconds
> old, received at 2019-07-15 15:06:34.661402:
> client_request(client.16129440:1072256 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 15:00:08.319912 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to rdlock, waiting
> > > 2019-07-15 15:07:11.519928 7f5fdcdc0700 1 mds.mds01
> Updating MDS map to version 68169 from mon.2
> > > 2019-07-15 15:07:19.624272 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : 11 slow requests, 1 included
> below; oldest blocked for > 59.588812 secs
> > > 2019-07-15 15:07:19.624278 7f5fda5bb700 0
> log_channel(cluster) log [WRN] : slow request 32.164260 seconds
> old, received at 2019-07-15 15:06:47.459980:
> client_request(client.16129440:1072326 getattr pAsLsXsFs
> #0x1000907443b 2019-07-15 15:00:21.118372 caller_uid=1161,
> caller_gid=1131{1131,4121,2330,2683,4115,2322,2779,2979,1503,3511,2783,2707,2942,2980,2258,2829,1238,1237,2793,1235,1249,2097,1154,2982,2983,3860,4101,1208,3638,3641,3644,3640,3643,3639,3642,3822,3945,4045,3521,3522,3520,3523,})
> currently failed to rdlock, waiting
> > >
> > >
> > > On Tue, Jul 16, 2019 at 1:18 PM Daniel Gryniewicz
> <dang(a)redhat.com <mailto:dang@redhat.com>> wrote:
> > > > This is not one I've seen before, and a quick look at the
> code looks
> > > > strange. The only assert in that bit is asserting the
> parent is a
> > > > directory, but the parent directory is not something that
> was passed in
> > > > by Ganesha, but rather something that was looked up
> internally in
> > > > libcephfs. This is beyond my expertise, at this point.
> Maybe some ceph
> > > > logs would help?
> > > >
> > > > Daniel
> > > >
> > > > On 7/15/19 10:54 AM, David C wrote:
> > > > > This list has been deprecated. Please subscribe to the
> new devel list at lists.nfs-ganesha.org
> <http://lists.nfs-ganesha.org>.
> > > > >
> > > > >
> > > > > Hi All
> > > > >
> > > > > I'm running 2.7.3 using the CEPH FSAL to export CephFS
> (Luminous), it
> > > > > ran well for a few days and crashed. I have a coredump,
> could someone
> > > > > assist me in debugging this please?
> > > > >
> > > > > (gdb) bt
> > > > > #0 0x00007f04dcab6207 in raise () from /lib64/libc.so.6
> > > > > #1 0x00007f04dcab78f8 in abort () from /lib64/libc.so.6
> > > > > #2 0x00007f04d2a9d6c5 in ceph::__ceph_assert_fail(char
> const*, char
> > > > > const*, int, char const*) () from
> /usr/lib64/ceph/libceph-common.so.0
> > > > > #3 0x00007f04d2a9d844 in
> ceph::__ceph_assert_fail(ceph::assert_data
> > > > > const&) () from /usr/lib64/ceph/libceph-common.so.0
> > > > > #4 0x00007f04cc807f04 in Client::_lookup_name(Inode*,
> Inode*, UserPerm
> > > > > const&) () from /lib64/libcephfs.so.2
> > > > > #5 0x00007f04cc81c41f in
> Client::ll_lookup_inode(inodeno_t, UserPerm
> > > > > const&, Inode**) () from /lib64/libcephfs.so.2
> > > > > #6 0x00007f04ccadbf0e in create_handle
> (export_pub=0x1baff10,
> > > > > desc=<optimized out>, pub_handle=0x7f0470fd4718,
> > > > > attrs_out=0x7f0470fd4740) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/FSAL_CEPH/export.c:256
> > > > > #7 0x0000000000523895 in mdcache_locate_host
> (fh_desc=0x7f0470fd4920,
> > > > > export=export@entry=0x1bafbf0,
> entry=entry@entry=0x7f0470fd48b8,
> > > > > attrs_out=attrs_out@entry=0x0)
> > > > > at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1011
> > > > > #8 0x000000000051d278 in mdcache_create_handle
> (exp_hdl=0x1bafbf0,
> > > > > fh_desc=<optimized out>, handle=0x7f0470fd4900,
> attrs_out=0x0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1578
> > > > > #9 0x000000000046d404 in nfs4_mds_putfh
> > > > > (data=data@entry=0x7f0470fd4ea0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_putfh.c:211
> > > > > #10 0x000000000046d8e8 in nfs4_op_putfh
> (op=0x7f03effaf1d0,
> > > > > data=0x7f0470fd4ea0, resp=0x7f03ec1de1f0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_op_putfh.c:281
> > > > > #11 0x000000000045d120 in nfs4_Compound (arg=<optimized
> out>,
> > > > > req=<optimized out>, res=0x7f03ec1de9d0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/Protocols/NFS/nfs4_Compound.c:942
> > > > > #12 0x00000000004512cd in nfs_rpc_process_request
> > > > > (reqdata=0x7f03ee5ed4b0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_worker_thread.c:1328
> > > > > #13 0x0000000000450766 in nfs_rpc_decode_request
> (xprt=0x7f02180c2320,
> > > > > xdrs=0x7f03ec568ab0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
> > > > > #14 0x00007f04df45d07d in svc_rqst_xprt_task
> (wpe=0x7f02180c2538) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:769
> > > > > #15 0x00007f04df45d59a in svc_rqst_epoll_events
> (n_events=<optimized
> > > > > out>, sr_rec=0x4bb53e0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:941
> > > > > #16 svc_rqst_epoll_loop (sr_rec=<optimized out>) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1014
> > > > > #17 svc_rqst_run_task (wpe=0x4bb53e0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/svc_rqst.c:1050
> > > > > #18 0x00007f04df465123 in work_pool_thread
> (arg=0x7f044c0008c0) at
> > > > >
> /usr/src/debug/nfs-ganesha-2.7.3/libntirpc/src/work_pool.c:181
> > > > > #19 0x00007f04dda05dd5 in start_thread () from
> /lib64/libpthread.so.0
> > > > > #20 0x00007f04dcb7dead in clone () from /lib64/libc.so.6
> > > > >
> > > > > Package versions:
> > > > >
> > > > > nfs-ganesha-2.7.3-0.1.el7.x86_64
> > > > > nfs-ganesha-ceph-2.7.3-0.1.el7.x86_64
> > > > > libcephfs2-14.2.1-0.el7.x86_64
> > > > > librados2-14.2.1-0.el7.x86_64
> > > > >
> > > > > I notice in my Ceph log I have a bunch of slow requests
> around the time
> > > > > it went down, I'm not sure if it's a symptom of Ganesha
> segfaulting or
> > > > > if it was a contributing factor.
> > > > >
> > > > > Thanks,
> > > > > David
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Nfs-ganesha-devel mailing list
> > > > > Nfs-ganesha-devel(a)lists.sourceforge.net
> <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
> > > > >
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> > > > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users(a)lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> --
> Jeff Layton <jlayton(a)poochiereds.net
> <mailto:jlayton@poochiereds.net>>
>
5 years, 2 months