Bjorn Leffler has uploaded this change for review.

View Change

Add Prometheus and Grafana monitoring stack to NFS Ganesha.

Measure, export, aggregate and display monitoring metrics for:
1. NFS requests, aggregated per {operation} and per {operation, export}
- Request rates
- Sent / received throughput rates in bytes/second.
- Latency percentiles in ms as a histogram.
- Request and Response sizing in bytes as a histogram.
2. Number of NFS worker threads in use.
3. Number of active NFS clients.
4. Metadata cache hit and miss rates, per {operation} and per {operation, export}
5. Rates of received and completed rpcs.
6. Number of RPCs in flight.
7. Per NFS client request and throughput rates.

This change adds the first C++ file to NFS Ganesha. I tried to implement the
monitoring in C only using the digitalocean C Prometheus client. This turned out
to be a bottleneck that wasted 25% of all cpu and limited overall request
throughput (https://github.com/digitalocean/prometheus-client-c/issues/59).
This flatlined Ganesha requests/second performance. As there are no other C
libraries, the monitoring code instead uses another C++ library, which doesn't
have this performance problem. The C++ library is thread safe, so no locking is
added in NFS Ganesha.

By default, export metrics on TCP port 9587. Reuse the same port that is
allocated to the nfs-ganesha exporter at
https://github.com/prometheus/prometheus/wiki/Default-port-allocations

Other changes:
- Add a parameter to NFS core parameters to allow the user change the
monitoring port.
- As this change adds new library dependencies, guard it by the USE_MONITORING
definition.
- Add a new log component, COMPONENT_MONITORING
- Add sample configuration file for Prometheus metrics aggregation.
- Add sample configuration file for Grafana console.

Signed-off-by: Bjorn Leffler <leffler@google.com>
Change-Id: I825cf2283f6e9e8c450cbf2923f92137fbee7e58
---
M src/CMakeLists.txt
M src/MainNFSD/CMakeLists.txt
M src/MainNFSD/nfs_main.c
M src/config_samples/config.txt
A src/config_samples/grafana.dashboard.json
A src/config_samples/prometheus.rules.yml
M src/include/config-h.in.cmake
M src/include/gsh_config.h
M src/include/log.h
A src/include/monitoring.h
M src/log/log_functions.c
A src/monitoring/CMakeLists.txt
A src/monitoring/monitoring.cc
A src/monitoring/monitoring_internal.h
M src/support/fridgethr.c
M src/support/nfs_read_conf.c
M src/support/server_stats.c
17 files changed, 4,228 insertions(+), 12 deletions(-)

git pull ssh://review.gerrithub.io:29418/ffilz/nfs-ganesha refs/changes/30/541330/1

To view, visit change 541330. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: ffilz/nfs-ganesha
Gerrit-Branch: next
Gerrit-Change-Id: I825cf2283f6e9e8c450cbf2923f92137fbee7e58
Gerrit-Change-Number: 541330
Gerrit-PatchSet: 1
Gerrit-Owner: Bjorn Leffler <leffler@google.com>
Gerrit-MessageType: newchange