Skip to main content

Prometheus endpoint monitoring with Netdata

The generic Prometheus endpoint collector gathers metrics from Prometheus endpoints that use the OpenMetrics exposition format.

  • As of v1.24, Netdata can autodetect more than 600 Prometheus endpoints, including support for Windows 10 via windows_exporter, and instantly generate new charts with the same high-granularity, per-second frequency as you expect from other collectors.

  • The full list of endpoints is available in the collector's configuration file.

  • Collecting metrics from Prometheus endpoints in Kubernetes.

Charts#

Netdata will produce one or more charts for every metric collected via a Prometheus endpoint. The number of charts depends entirely on the number of exposed metrics.

For example, scraping node_exporter produces 3000+ metrics.

Configuration#

Edit the go.d/prometheus.conf configuration file using edit-config from the Netdata config directory, which is typically at /etc/netdata.

cd /etc/netdata # Replace this path with your Netdata config directory
sudo ./edit-config go.d/prometheus.conf

To add a new endpoint to collect metrics from, or change the URL that Netdata looks for, add or configure the name and url values. Endpoints can be both local or remote as long as they expose their metrics on the provided URL.

Here is an example with two endpoints:

jobs:
- name: node_exporter_local
url: http://127.0.0.1:9100/metrics
- name: win10
url: http://203.0.113.0:9182/metrics

Dimension algorithm#

incremental algorithm (values displayed as rate) used when:

  • the metric type is Counter, Histogram or Summary.
  • the metrics suffix is _total, _sum or _count.

absolute algorithm (values displayed as is) is used in all other cases.

Use force_absolute_algorithm configuration option to overwrite the logic.

jobs:
- name: node_exporter_local
url: http://127.0.0.1:9100/metrics
force_absolute_algorithm:
- '*_sum'
- '*_count'

Time Series Selector (filtering)#

To filter unwanted time series (metrics) use selector configuration option.

Here is an example:

jobs:
- name: node_exporter_local
url: http://127.0.0.1:9100/metrics
# (allow[0] || allow[1] || ...) && !(deny[0] || deny[1] || ...)
selector:
allow:
- <PATTERN>
- <PATTERN>
deny:
- <PATTERN>
- <PATTERN>

To find PATTERN syntax description and more examples see selectors readme.

Time Series Grouping#

This module groups time series into charts. It has built-in grouping logic (based on metric type). It is possible to extend it via group configuration option.

Gauge and Counter#

  • A chart per every metric.
  • Dimensions are labels sets.
  • Dimensions per chart limit is 50. If there is more dimensions the chart split into several charts.
  • Values as is.

For instance, the following time series produce 1 chart.

example_device_cur_state{name="0",type="Fan"} 0
example_device_cur_state{name="1",type="Fan"} 0
example_device_cur_state{name="10",type="Processor"} 0
example_device_cur_state{name="11",type="intel_powerclamp"} -1
example_device_cur_state{name="2",type="Fan"} 0
example_device_cur_state{name="3",type="Fan"} 0
example_device_cur_state{name="4",type="Fan"} 0
example_device_cur_state{name="5",type="Processor"} 0
example_device_cur_state{name="6",type="Processor"} 0
example_device_cur_state{name="7",type="Processor"} 0
example_device_cur_state{name="8",type="Processor"} 0
example_device_cur_state{name="9",type="Processor"} 0

Custom Grouping (Gauge and Counter only)#

To group time series use group configuration option.

Here is an example:

jobs:
- name: node_exporter_local
url: http://127.0.0.1:9100/metrics
group:
- selector: <PATTERN>
by_label: <a space separated list of labels names>
- selector: <PATTERN>
by_label: <a space separated list of labels names>

To find PATTERN syntax description and more examples see selectors readme.

This example configuration groups all time series with metric names equal to example_device_cur_state into multiple charts by type label. Number of charts is equal to number of type label values.

jobs:
- name: node_exporter_local
url: http://127.0.0.1:9100/metrics
group:
- selector: example_device_cur_state
by_label: type

Summary#

  • A chart per time series (label set).
  • Dimensions are quantiles.
  • Values as is.

For instance, the following time series produce 2 charts.

example_duration_seconds{interval="15s",quantile="0"} 4.693e-06
example_duration_seconds{interval="15s",quantile="0.25"} 2.4383e-05
example_duration_seconds{interval="15s",quantile="0.5"} 0.00013458
example_duration_seconds{interval="15s",quantile="0.75"} 0.000195183
example_duration_seconds{interval="15s",quantile="1"} 0.005386229
example_duration_seconds{interval="30s",quantile="0"} 4.693e-06
example_duration_seconds{interval="30s",quantile="0.25"} 2.4383e-05
example_duration_seconds{interval="30s",quantile="0.5"} 0.00013458
example_duration_seconds{interval="30s",quantile="0.75"} 0.000195183
example_duration_seconds{interval="30s",quantile="1"} 0.005386229

Histogram#

  • A chart per time series (label set).
  • Dimensions are le buckets.
  • Values are not as is because histogram buckets are cumulative (le="0.3" contains le="1.2"). We calculate exact values for all buckets.

For instance, the following time series produce 2 charts.

example_seconds_bucket{interval="15s",le="0.1"} 0
example_seconds_bucket{interval="15s",le="0.25"} 0
example_seconds_bucket{interval="15s",le="0.5"} 0
example_seconds_bucket{interval="15s",le="1"} 0
example_seconds_bucket{interval="15s",le="2.5"} 0
example_seconds_bucket{interval="15s",le="5"} 0
example_seconds_bucket{interval="15s",le="+Inf"} 0
example_seconds_bucket{interval="30s",le="0.1"} 0
example_seconds_bucket{interval="30s",le="0.25"} 0
example_seconds_bucket{interval="30s",le="0.5"} 0
example_seconds_bucket{interval="30s",le="1"} 0
example_seconds_bucket{interval="30s",le="2.5"} 0
example_seconds_bucket{interval="30s",le="5"} 0
example_seconds_bucket{interval="30s",le="+Inf"} 0

For all available options, see the Prometheus collector's configuration file.

Troubleshooting#

To troubleshoot issues with the prometheus collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn't working.

First, navigate to your plugins directory, usually at /usr/libexec/netdata/plugins.d/. If that's not the case on your system, open netdata.conf and look for the setting plugins directory. Once you're in the plugin's directory, switch to the netdata user.

cd /usr/libexec/netdata/plugins.d/
sudo -u netdata -s

You can now run the go.d.plugin to debug the collector:

./go.d.plugin -d -m prometheus

Reach out

If you need help after reading this doc, search our community forum for an answer. There's a good chance someone else has already found a solution to the same issue.

Documentation

Community