Database
Netdata is fully capable of long-term metrics storage, at per-second granularity, via its default database engine
(dbengine
). But to remain as flexible as possible, Netdata supports a number of types of metrics storage:
dbengine
, (the default) data are in database files. The Database Engine works like a traditional database. There is some amount of RAM dedicated to data caching and indexing and the rest of the data reside compressed on disk. The number of history entries is not fixed in this case, but depends on the configured disk space and the effective compression ratio of the data stored. This is the only mode that supports changing the data collection update frequency (update_every
) without losing the previously stored metrics. For more details see here.ram
, data are purely in memory. Data are never saved on disk. This mode usesmmap()
and supports KSM.save
, data are only in RAM while Netdata runs and are saved to / loaded from disk on Netdata restart. It also usesmmap()
and supports KSM.map
, data are in memory mapped files. This works like the swap. Keep in mind though, this will have a constant write on your disk. When Netdata writes data on its memory, the Linux kernel marks the related memory pages as dirty and automatically starts updating them on disk. Unfortunately we cannot control how frequently this works. The Linux kernel uses exactly the same algorithm it uses for its swap memory. Check below for additional information on running a dedicated central Netdata server. This mode usesmmap()
but does not support KSM.none
, without a database (collected metrics can only be streamed to another Netdata).alloc
, likeram
but it usescalloc()
and does not support KSM. This mode is the fallback for all others exceptnone
.
You can select the memory mode by editing netdata.conf
and setting:
#
Running Netdata in embedded devicesEmbedded devices usually have very limited RAM resources available.
There are 2 settings for you to tweak:
update every
, which controls the data collection frequencyhistory
, which controls the size of the database in RAM (except formemory mode = dbengine
)
By default update every = 1
and history = 3600
. This gives you an hour of data with per second updates.
If you set update every = 2
and history = 1800
, you will still have an hour of data, but collected once every 2
seconds. This will cut in half both CPU and RAM resources consumed by Netdata. Of course experiment a bit. On very
weak devices you might have to use update every = 5
and history = 720
(still 1 hour of data, but 1/5 of the CPU and
RAM resources).
You can also disable data collection plugins you don't need. Disabling such plugins will also free both CPU and RAM resources.
#
Running a dedicated central Netdata serverNetdata allows streaming data between Netdata nodes. This allows us to have a central Netdata server that will maintain the entire database for all nodes, and will also run health checks/alarms for all nodes.
For this central Netdata, memory size can be a problem. Fortunately, Netdata supports several memory modes. One
interesting option for this setup is memory mode = map
.
#
mapIn this mode, the database of Netdata is stored in memory mapped files. Netdata continues to read and write the database in memory, but the kernel automatically loads and saves memory pages from/to disk.
We suggest not to use this mode on nodes that run other applications. There will always be dirty memory to be
synced and this syncing process may influence the way other applications work. This mode however is useful when we need
a central Netdata server that would normally need huge amounts of memory. Using memory mode map
we can overcome all
memory restrictions.
There are a few kernel options that provide finer control on the way this syncing works. But before explaining them, a brief introduction of how Netdata database works is needed.
For each chart, Netdata maps the following files:
chart/main.db
, this is the file that maintains chart information. Every time data are collected for a chart, this is updated.chart/dimension_name.db
, this is the file for each dimension. At its beginning there is a header, followed by the round robin database where metrics are stored.
So, every time Netdata collects data, the following pages will become dirty:
- the chart file
- the header part of all dimension files
- if the collected metrics are stored far enough in the dimension file, another page will become dirty, for each dimension
Each page in Linux is 4KB. So, with 200 charts and 1000 dimensions, there will be 1200 to 2200 4KB pages dirty pages every second. Of course 1200 of them will always be dirty (the chart header and the dimensions headers) and 1000 will be dirty for about 1000 seconds (4 bytes per metric, 4KB per page, so 1000 seconds, or 16 minutes per page).
Hopefully, the Linux kernel does not sync all these data every second. The frequency they are synced is controlled by
/proc/sys/vm/dirty_expire_centisecs
or the sysctl
vm.dirty_expire_centisecs
. The default on most systems is 3000
(30 seconds).
On a busy server centralizing metrics from 20+ servers you will experience this:
As you can see, there is quite some stress (this is iowait
) every 30 seconds.
A simple solution is to increase this time to 10 minutes (60000). This is the same system with this setting in 10 minutes:
Of course, setting this to 10 minutes means that data on disk might be up to 10 minutes old if you get an abnormal shutdown.
There are 2 more options to tweak:
dirty_background_ratio
, by default10
.dirty_ratio
, by default20
.
These control the amount of memory that should be dirty for disk syncing to be triggered. On dedicated Netdata servers,
you can use: 80
and 90
respectively, so that all RAM is given to Netdata.
With these settings, you can expect a little iowait
spike once every 10 minutes and in case of system crash, data on
disk will be up to 10 minutes old.
To have these settings automatically applied on boot, create the file /etc/sysctl.d/netdata-memory.conf
with these
contents:
There is another memory mode to help overcome the memory size problem. What is most interesting for this setup is
memory mode = dbengine
.
#
dbengineIn this mode, the database of Netdata is stored in database files. The Database Engine works like a traditional database. There is some amount of RAM dedicated to data caching and indexing and the rest of the data reside compressed on disk. The number of history entries is not fixed in this case, but depends on the configured disk space and the effective compression ratio of the data stored.
We suggest to use this mode on nodes that also run other applications. The Database Engine uses direct I/O to avoid
polluting the OS filesystem caches and does not generate excessive I/O traffic so as to create the minimum possible
interference with other applications. Using memory mode dbengine
we can overcome most memory restrictions. For more
details see here.
#
KSMNetdata offers all its round robin database to kernel for deduplication (except for memory mode = dbengine
).
In the past KSM has been criticized for consuming a lot of CPU resources. Although this is true when KSM is used for deduplicating certain applications, it is not true with netdata, since the Netdata memory is written very infrequently (if you have 24 hours of metrics in netdata, each byte at the in-memory database will be updated just once per day).
KSM is a solution that will provide 60+% memory savings to Netdata.
#
Enable KSM in kernelYou need to run a kernel compiled with:
When KSM is enabled at the kernel is just available for the user to enable it.
So, if you build a kernel with CONFIG_KSM=y
you will just get a few files in /sys/kernel/mm/ksm
. Nothing else
happens. There is no performance penalty (apart I guess from the memory this code occupies into the kernel).
The files that CONFIG_KSM=y
offers include:
/sys/kernel/mm/ksm/run
by default0
. You have to set this to1
for the kernel to spawnksmd
./sys/kernel/mm/ksm/sleep_millisecs
, by default20
. The frequency ksmd should evaluate memory for deduplication./sys/kernel/mm/ksm/pages_to_scan
, by default100
. The amount of pages ksmd will evaluate on each run.
So, by default ksmd
is just disabled. It will not harm performance and the user/admin can control the CPU resources
he/she is willing ksmd
to use.
ksmd
kernel daemon#
Run To activate / run ksmd
you need to run:
With these settings ksmd does not even appear in the running process list (it will run once per second and evaluate 100 pages for de-duplication).
Put the above lines in your boot sequence (/etc/rc.local
or equivalent) to have ksmd
run at boot.
#
Monitoring Kernel Memory de-duplication performanceNetdata will create charts for kernel memory de-duplication performance, like this: