I also run ganesha active/active with ctdb, my versions are quite outdated now, haven't done much tuning but I'll share some thoughts
I'm assuming you're using the ceph FSAL with Ganesha rather than exporting a cephfs kernel mount.
Increasing the ceph client cache size made the biggest difference for me, that's the client_oc_size parameter, you set that in the [client] section of your ceph.conf on the Ganesha servers. I have all the Ganesha caching turned off in my Ganesha.conf (don't have access currently to post parameters)
If you have any reasonably fast local storage on one of the Ganesha servers, I would export a dir using the VFS FSAL and mount that on your user's terminal and test their workload. If there's no improvement, that would suggest the bottleneck is the nfs implementation rather than ceph.
Or try and reproduce the workload on a ceph kernel mount, if performance is still poor that would suggest the issue is with your cluster.
In terms of benchmarking I typically use good old fio
Let me know how you get on