Hi,
I used the latest ganesha code and ran the following test from a NFSv3
client in a script:
ct=0
while [ $ct -lt 4096 ]; do
flock -x mylock echo 1 >> myfile
let ct=$ct+1
done
After running for 1000+ times the client got error "No locks available"
ganesha.log had the following trace:
ganesha.nfsd-9328[svc_103] nsm_connect :NLM :CRIT :connect to statd
failed: RPC: Unknown protocol
/var/log/messages showed "Too many open files" message. It looks like for
NLM LOCK requests connection to rpc.statd were created but not closed for
NLM UNLOCK request.
After analyzing the code, it seems this happens because for NLM LOCK
request the 'xprt->xp_refcnt' is ref'ed twice. But while handling NLM
UNLOCK request the 'xprt->xp_refcnt' is un-ref'ed only once, and thus
svc_vc_destroy_it() doesn't get called and connection to rpc.statd is not
closed.
More details about the code analysis is below. Can you please check about
this issue ? Thank you. I am not sure why are we incrementing 'xprt->
xp_refcnt' twice in svc_xprt_lookup() ?
For NLM LOCK request the code path is:
--------------------------------------------------------------
nlm4_Lock() -> ...... -> nsm_connect() -> ....... -> makefd_xprt() ->
svc_xprt_lookup()
137 SVCXPRT *
138 svc_xprt_lookup(int fd, svc_xprt_setup_t setup)
139 {
......
......
173 (*setup)(&xprt); /* zalloc, xp_refcnt = 1
*/ --> leads to call to svc_vc_xprt_setup()
174 xprt->xp_fd = fd;
175 xprt->xp_flags = SVC_XPRT_FLAG_INITIAL;
176
177 /* Get ref for caller */
178 SVC_REF(xprt, SVC_REF_FLAG_NONE);
Here, at line 173 function svc_vc_xprt_setup() is called which sets 'xprt->
xp_refcnt = 1'
Then at line 178, SVC_REF increments 'xprt->xp_refcnt' by 1. Thus, when
handling NLM LOCK request 'xprt->xp_refcnt = 2' is set.
For NLM UNLOCK request the code path is:
-------------------------------------------------------------------
nlm4_Unlock() -> ...... -> nsm_disconnect -> ..... -> clnt_vc_destroy() ->
svc_release_it()
410 static inline void svc_release_it(SVCXPRT *xprt, u_int flags,
411 const char *tag, const int line)
412 {
413 int32_t refs = atomic_dec_int32_t(&xprt->xp_refcnt);
......
......
425 if (likely(refs > 0)) {
426 /* normal case */
427 return;
428 }
429
430 /* enforce once-only semantic, trace others */
431 xp_flags = atomic_postset_uint16_t_bits(&xprt->xp_flags,
432
SVC_XPRT_FLAG_RELEASING);
......
439 /* Releasing last reference */
440 (*(xprt)->xp_ops->xp_destroy)(xprt, flags, tag, line);
Here, at line 413 'xprt->xp_refcnt' gets decremented and becomes
'xprt->
xp_refcnt = 1'.
But as 'xprt->xp_refcnt != 0' the function returns from line 427. And thus
it doesn't proceed with closure of connection.
Thanks,
Madhu Thorat
IBM-India Software Labs, Pune.