FSAL_RGW nfs3 write failed
by j j
Hi everyone,
I was using FSAL_RGW to export nfsv3, and trying to copy files from
ceph-fuse mountpoint to the nfs mountpoint, file size is 1.5mb, but some
files write failed with "input/output error" returning.
After simply debug, I found that the error is because the write is
unordered, the file size is 1.5mb, so it need two write rpc, sometimes the
second rpc will processed first, then the librgw will return EIO fail. But
here's the strange thing, when I copy files from local fs dir, the problem
never appearance.
And the receiving order is always right, but processing order is not.
But why this never appearance when copy from local fs ? And Is there any
suggestion to fix? Really appreciate your help.
Thanks
ganesha version: i tried two version:2.6.2 and 2.8.1
ceph version: 12.2.5 luminous
And here is log of a failed writing:
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] nfs_rpc_decode_request :DISP :DEBUG
:0x7f66cc001090 fd 43 context 0x7f66cc00e390
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_rpc_decode_request :DISP :DEBUG
:0x7f66cc001090 fd 43 context 0x7f66e81864f0
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] get_gsh_client :HT CACHE :DEBUG :client_mgr
cache hit slot 0
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_rpc_process_request :DISP :DEBUG :Request
from ::ffff:10.10.112.54 for Program 100003, Version 3, Function 7 has
xid=218961045
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] get_gsh_export :HT CACHE :DEBUG :export_mgr
cache hit slot 1
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs3_write :NFS3 :DEBUG :REQUEST PROCESSING:
Calling nfs_Write handle: File Handle V3: Len=24
4300000110bef9bc69eeb74054f5b17dac1ddb65af000000 start: 100000 len: 80000
UNSTABLE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] cih_get_by_key_latch :HT CACHE :DEBUG :cih cache
hit slot 9517
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] rgw_fsal_write2 :FSAL :DEBUG : rgw_fsal_write2
post obj_hdl 0x7f66d80317d0 state (nil) returned -5
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs3_Errno_verbose :NFS3 :CRIT :Error I/O error
in nfs3_write converted to NFS3ERR_IO but was set non-retryable
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] get_gsh_client :HT CACHE :DEBUG :client_mgr cache
hit slot 0
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] nfs_rpc_process_request :DISP :DEBUG :Request
from ::ffff:10.10.112.54 for Program 100003, Version 3, Function 7 has
xid=202183829
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] get_gsh_export :HT CACHE :DEBUG :export_mgr cache
hit slot 1
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] nfs3_write :NFS3 :DEBUG :REQUEST PROCESSING:
Calling nfs_Write handle: File Handle V3: Len=24
4300000110bef9bc69eeb74054f5b17dac1ddb65af000000 start: 0 len: 100000
UNSTABLE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] cih_get_by_key_latch :HT CACHE :DEBUG :cih cache
hit slot 9517
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] dupreq_tcp_cmpf :DUPREQ :DEBUG :dupreq_tcp_cmpf
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] dupreq_tcp_cmpf :DUPREQ :DEBUG :xids eq
2920027285, ck1 12828036216958414204 ck2 12828036216958414204
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_dupreq_finish :DUPREQ :DEBUG :retiring
ov=0x7f66fc028360 xid=2920027285 on DRC=0x7f66dc003bb0
state=DUPREQ_COMPLETE, status=DUPREQ_SUCCESS, refcnt=1
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_dupreq_free_dupreq :DUPREQ :DEBUG :freeing
dupreq entry dv=0x7f66fc028360, dv xid=2920027285 cksum
12828036216958414204 state=DUPREQ_COMPLETE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] dupreq_tcp_cmpf :DUPREQ :DEBUG :dupreq_tcp_cmpf
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] dupreq_tcp_cmpf :DUPREQ :DEBUG :xids eq
2936804501, ck1 8812608633411635075 ck2 8812608633411635075
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_dupreq_finish :DUPREQ :DEBUG :retiring
ov=0x7f66e400fd90 xid=2936804501 on DRC=0x7f66dc003bb0
state=DUPREQ_COMPLETE, status=DUPREQ_SUCCESS, refcnt=1
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_dupreq_free_dupreq :DUPREQ :DEBUG :freeing
dupreq entry dv=0x7f66e400fd90, dv xid=2936804501 cksum 8812608633411635075
state=DUPREQ_COMPLETE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] nfs_rpc_decode_request :DISP :DEBUG :SVC_DECODE
on 0x7f66cc001090 fd 43 (::ffff:10.10.112.54:706) xid=218961045 returned
XPRT_IDLE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_13] free_nfs_request :DISP :DEBUG :free_nfs_request:
0x7f66cc001090 fd 43 xp_refs 5 rq_refs 0
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] rgw_fsal_write2 :FSAL :DEBUG : rgw_fsal_write2
post obj_hdl 0x7f66d80317d0 state (nil) returned 0
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] nfs_rpc_decode_request :DISP :DEBUG :SVC_DECODE
on 0x7f66cc001090 fd 43 (::ffff:10.10.112.54:706) xid=202183829 returned
XPRT_IDLE
17/12/2019 14:23:42 : epoch 5df873e3 : 10-10-112-54 :
ganesha.nfsd-41666[svc_8] free_nfs_request :DISP :DEBUG :free_nfs_request:
0x7f66cc001090 fd 43 xp_refs 3 rq_refs 0
4 years, 10 months