It's been a while, but I'm finally ready to contribute my monitoring changes to the main branch. Below are some screenshots to give you an idea of what the main Ganesha dashboard looks like. It's straightforward to add metrics for other FSALs. Initially, I tried to write all the code in C, but that didn't work well, as the Digital Ocean Prometheus C client had a serious performance issue. The higher the Ganesha load, the more overall performance decreased. So I switched to using the recommended C++ client instead, which has worked much better in my high performance tests. I've also written a wrapper library around that C++ client, so that a single function call from Ganesha automatically generates:

Request rates.
Network throughput rates.
Latency percentiles.
Request size percentiles.
Response size percentiles.

What I'd like to do next is to:

Release the C++ wrapper as a standalone piece of software, under the Apache 2 license. This is so that it can be integrated into other applications.
Add these modifications to the main branch:

A header file into src/include.
C and C++ files into the new directory src/monitoring
A few function calls into C files in the src/MainNFSD directory.
Monitoring configuration files into src/config_samples.
Modify the CMakeLists.txt files, leaving the new monitoring disabled by default.

Does that sound like a good plan? Any comments or suggestions?

Thanks,

Bjorn

On Thu, Jul 16, 2020 at 2:57 AM Malahal Naineni <malahal@gmail.com> wrote:

Including rpcinfo checks for various services would be good to have as well.

On Tue, Jul 14, 2020 at 5:27 PM Daniel Gryniewicz <dang@redhat.com> wrote:
This seems like a fine idea to me. All the counters I'm aware of are
available via DBUS.

Daniel

On 7/14/20 2:12 AM, Bjorn Leffler via Devel wrote:
> Apart from the counters that you can access through dbus, is there any
> other monitoring built into Ganesha?
>
> I'm thinking of adding it with this higher level plan:
> - Exporting metrics from Ganesha to Prometheus.
> - Aggregate data in Prometheus.
> - Display monitoring consoles and graphs with Grafana.
> - Package up Prometheus, Grafana and the preconfigured rules/dashboards
> as a Docker image.
> - This makes it straightforward to deploy monitoring alongside a Gansha
> binary.
>
> My rough coding plan for the code is to:
> - Add a USE_MONITORING directive to the CMakeLists.txt files.
> - Add a build dependency to the Prometheus C client
> <https://github.com/digitalocean/prometheus-client-c>.
> - Create a src/monitoring directory for the new source files and templates.
> - Increment counters and timers throughout the code.
> - Use histograms to compute latency percentiles, heatmaps, etc.
>
> Is this a good idea? Any objections or suggestions?
>
> Thanks,
> Bjorn
>
>
> _______________________________________________
> Devel mailing list -- devel@lists.nfs-ganesha.org
> To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org
>
_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org

_______________________________________________
Devel mailing list -- devel@lists.nfs-ganesha.org
To unsubscribe send an email to devel-leave@lists.nfs-ganesha.org