Daniel, thank you for the quick response. I will fix this and verify in our environment.
Will post a review request soon.
Do we use pull requests for ntirpc fixes?
Thanks,
Sriram
On 1/30/20, 11:08 AM, "Daniel Gryniewicz" <dang(a)redhat.com> wrote:
Definitely not a known issue, and it certainly sounds plausible.
Daniel
On 1/30/20 1:45 PM, Sriram Patil wrote:
Hi,
While setting up some Kerberos shares with ganesha I came across an issue where the
client receives RPCSEC_GSS_CREDPROBLEM. The source of this error is is form following code
in function _svcauth_gss which is used when authenticating RPCSEC_GSS request,
442 /* Context lookup. */
443 if ((gc->gc_proc == RPCSEC_GSS_DATA)
444 || (gc->gc_proc == RPCSEC_GSS_DESTROY)) {
445
446 /* Per RFC 2203 5.3.3.3, if a valid security context
447 * cannot be found to authorize a request, the
448 * implementation returns RPCSEC_GSS_CREDPROBLEM.
449 * N.B., we are explicitly allowed to discard contexts
450 * for any reason (e.g., to save space). */
451 gd = authgss_ctx_hash_get(gc);
452 if (!gd) {
453 rc = RPCSEC_GSS_CREDPROBLEM;
454 goto cred_free;
455 }
456 if (gc->gc_svc != gd->sec.svc)
457 gd->sec.svc = gc->gc_svc;
458 }
When the GSS proc is GSS_DATA or GSS_DESTROY, ntirpc expects the context to be in the
cache. If not it returns RPCSEC_GSS_CREDPROBLEM.
For the above condition to be true, this context should have been inserted in the cache
as part of GSS_INIT request. In the same function, the svc_rpc_gss_data object is inserted
in the cache but there is a difference in the dispatch code. When the request is
RPCSEC_GSS_INIT, ntirpc sends the reply back to the client before caching the context.
533 *no_dispatch = true;
534
535 req->rq_msg.RPCM_ack.ar_results.where = &gr;
536 req->rq_msg.RPCM_ack.ar_results.proc =
537 (xdrproc_t) xdr_rpc_gss_init_res;
538 call_stat = svc_sendreply(req);
<-- Sending reply to client here
539
540 /* XXX */
541 gss_release_buffer(&min_stat, &gr.gr_token);
542 gss_release_buffer(&min_stat, &gd->checksum);
543 mem_free(gr.gr_ctx.value, 0);
544
545 if (call_stat >= XPRT_DIED) {
546 rc = AUTH_FAILED;
547 goto gd_free;
548 }
549
550 if (gr.gr_major == GSS_S_COMPLETE) {
……
……
583 (void)authgss_ctx_hash_set(gd);
<-- Inserting the svc_rpc_gss_data object into the cache
584 }
In my setup, I observed that, the client can send another request, let's say
EXCHANGE_ID with RPCSEC_GSS_DATA and end up returning the CREDPROBLEM error if that work
thread gets scheduled first and tries to authenticate the request before the context is
inserted in the cache. I believe we should be sending the reply after inserting the
context in the cache.
Please find some logs below which confirm the race. Svc_10 is the RPCSEC_GSS_INIT request
and svc_4 is RPCSEC_GSS_DATA request. We can observe that rbtree_x_cached_lookup happens
before rbtree_x_cached_insert for hash key f8003f00,
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC
:xdr_reply_encode:109 SUCCESS
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC
:xdr_rpc_gss_encode() success (0x7f89fc002360:16)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC
:xdr_rpc_gss_encode() success (0x7f89fc008bb0:155)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC
:xdr_rpc_gss_init_res() encode success (ctx 0x7f89fc002360:16, maj 0, min 0, win 32, token
0x7f89fc008bb0:155)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC :svc_ref_it()
0x7f8a04000a90 fd 32 xp_refcnt 5 af 10 port 35536 @svc_ioq_write_now:265
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC :svc_release_it()
0x7f8a04000a90 fd 32 xp_refcnt 4 af 10 port 35536 @svc_ioq_write:233
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC :xdr_ioq_destroy()
xioq 0x7f89fc002800
....
....
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_4] 0 :rpc :TIRPC :xdr_rpc_gss_cred()
decode success (v 1, proc 0, seq 1, svc 2, ctx 0x7f8a0c001780:16)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_4] 0 :rpc :TIRPC
:rbtree_x_cached_lookup: t 0x7f8a0c0021c0 nk 0x7f8a20153e80 nv (nil)( hk f8003f00
slot/offset 56)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_4] 0 :rpc :TIRPC :xdr_rpc_gss_cred()
decode success (v 1, proc 0, seq 1, svc 2, ctx 0x7f8a0c001780:16)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_4] 728 :nfs_rpc_process_request :DISP
:Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
....
....
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC
:rbtree_x_cached_insert: cix 9 ct 0x7f8a0c0021c0 t 0x7f8a0c0021c0 inserting 0x7f89fc001890
(cache hk f8003f00 slot/offset 56) flags 3
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 0 :rpc :TIRPC :xdr_rpc_gss_cred()
decode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0)
2020-01-30T16:52:42Z : ganesha.nfsd-2248[none] [svc_10] 740 :nfs_rpc_process_request
:DISP :RPCSEC_GSS no_dispatch=1 gc->gc_proc=(1) RPCSEC_GSS_INIT
Is this a known issue or being worked on already?
Thanks,
Sriram