Hi,
I used the latest ganesha code and ran the following test from a NFSv3 client in a script:
ct=0
while [ $ct -lt 4096 ]; do
flock -x mylock echo 1 >> myfile
let ct=$ct+1
done
After running for 1000+ times the client got error "No locks available"
ganesha.log had the following trace:
ganesha.nfsd-9328[svc_103] nsm_connect :NLM :CRIT :connect to statd failed: RPC: Unknown protocol
/var/log/messages showed "Too many open files" message. It looks like for NLM LOCK requests connection to rpc.statd were created but not closed for NLM UNLOCK request.
After analyzing the code, it seems this happens because for NLM LOCK request the 'xprt->xp_refcnt' is ref'ed twice. But while handling NLM UNLOCK request the 'xprt->xp_refcnt' is un-ref'ed only once, and thus svc_vc_destroy_it() doesn't get called and connection to rpc.statd is not closed.
More details about the code analysis is below. Can you please check about this issue ? Thank you. I am not sure why are we incrementing 'xprt->xp_refcnt' twice in svc_xprt_lookup() ?
For NLM LOCK request the code path is:
--------------------------------------------------------------
nlm4_Lock() -> ...... -> nsm_connect() -> ....... -> makefd_xprt() -> svc_xprt_lookup()
137 SVCXPRT *
138 svc_xprt_lookup(int fd, svc_xprt_setup_t setup)
139 {
......
......
173 (*setup)(&xprt); /* zalloc, xp_refcnt = 1 */ --> leads to call to svc_vc_xprt_setup()
174 xprt->xp_fd = fd;
175 xprt->xp_flags = SVC_XPRT_FLAG_INITIAL;
176
177 /* Get ref for caller */
178 SVC_REF(xprt, SVC_REF_FLAG_NONE);
Here, at line 173 function svc_vc_xprt_setup() is called which sets 'xprt->xp_refcnt = 1'
Then at line 178, SVC_REF increments 'xprt->xp_refcnt' by 1. Thus, when handling NLM LOCK request 'xprt->xp_refcnt = 2' is set.
For NLM UNLOCK request the code path is:
-------------------------------------------------------------------
nlm4_Unlock() -> ...... -> nsm_disconnect -> ..... -> clnt_vc_destroy() -> svc_release_it()
410 static inline void svc_release_it(SVCXPRT *xprt, u_int flags,
411 const char *tag, const int line)
412 {
413 int32_t refs = atomic_dec_int32_t(&xprt->xp_refcnt);
......
......
425 if (likely(refs > 0)) {
426 /* normal case */
427 return;
428 }
429
430 /* enforce once-only semantic, trace others */
431 xp_flags = atomic_postset_uint16_t_bits(&xprt->xp_flags,
432 SVC_XPRT_FLAG_RELEASING);
......
439 /* Releasing last reference */
440 (*(xprt)->xp_ops->xp_destroy)(xprt, flags, tag, line);
Here, at line 413 'xprt->xp_refcnt' gets decremented and becomes 'xprt->xp_refcnt = 1'.
But as 'xprt->xp_refcnt != 0' the function returns from line 427. And thus it doesn't proceed with closure of connection.
Thanks,
Madhu Thorat
IBM-India Software Labs, Pune.