Prometheus Metrics for MooseFS
This guide explains how to integrate MooseFS with Prometheus for effective system monitoring. It covers how to configure Prometheus to collect metrics from MooseFS, provides an overview of the structure of these metrics, and explains how to customize metric collection to optimize monitoring performance.
Prometheus is an open-source monitoring and alerting toolkit designed to collect and analyze system metrics in real time. All the data available in the MooseFS GUI can also be accessed as Prometheus metrics, making it easier to interpret since the format aligns with what is already familiar in the GUI. Additionally, most metrics include descriptions available as tooltips in MooseFS GUI, Prometheus, and Grafana.
MooseFS exports a large number of metrics, and this guide outlines how to customize the selection to suit specific monitoring needs, helping to reduce unnecessary load on your monitoring system while ensuring critical data is captured.
Configuring Data for Metrics Export
The metrics are sourced from the MooseFS command-line interface (CLI) using the mfscli command, as well as from MooseFS charts for Master Servers and Chunk Servers available in GUI. Some metrics may be available in both the CLI output and the charts, so it’s important to avoid exporting redundant data to prevent unnecessary load on your monitoring system and ensure efficient performance.
Metrics from Command Line Interface (CLI) Data Sets
The following Data Sets are currently available, corresponding to the data presented via the MooseFS GUI and CLI: SIM,SLI,SIG,SMU,SIC,SIL,SCS,SMB,SHD,SSC,SMO,SQU. For a description of these data sets, refer to the output of mfscli -h or see the list below. The remaining data sets available in the CLI are not allowed for the reasons specified next to them.
-
SIN: show full Master info - NOT allowed because includes possibly too long list of missing chunks/files (SMF) -
SIM: show only Masters states; metricsmfs_info_masters_* -
SLI: show only licence info; metricsmfs_info_licence_* -
SIG: show only general Master (Leader) info; metricsmfs_info_general_* -
SMU: show only Master memory usage; metrics metricsmfs_info_memory_* -
SIC: show only chunks info (target/current redundancy level matrices); metricsmfs_info_chunks_* -
SIL: show only loop info (with messages);metrics mfs_info_*_loop_* -
SMF: show only missing chunks/files - NOT allowed because possibly too long list of missing chunks/files -
SCS: show connected Chunkservers; metricsmfs_chunkservers_* -
SMB: show connected metadata backup servers -
SHD: show hdd data; metricsmfs_disks_* -
SEX: show exports - NOT allowed because not much interesting numeric data here -
SMS: show active mounts - NOT allowed because not much interesting numeric data here -
SRS: show resources - NOT allowed because possibly too long list of open files and locks -
SSC: show storage classes; metricsmfs_storage_classes_* -
SPA: show patterns override data - NOT allowed because not much interesting numeric data here -
SOF: show only open files - NOT allowed because possibly too long list of open files -
SAL: show only acquired locks - NOT allowed because possibly too long list of acquired locks -
SMO: show operation counters;metrics mfs_operations_* -
SQU: show quota info;metrics mfs_quotas_* -
SMC: show Master charts data - NOT allowed as scope because covered with cscharts -
SCC: show Chunkserver charts data - NOT allowed as scope because covered with mastercharts
Metrics from Charts
MooseFS metrics also include data from the charts for Master Servers and Chunk Servers, which are available in the GUI. Metrics related to Master Servers and Chunk Servers contain the infixes mastercharts and cscharts, respectively.
Default set of metrics
By default, if no specific metrics are selected, all available mfscli Data Sets (SIM,SLI,SIG,SIC,SIL,SCS,SMB,SHD) and all chart metrics for Master Servers and Chunk Servers will be enabled. This allows you to explore the full range of available data and decide which metrics are essential for your monitoring setup.
MooseFS can generate a large number of metrics, which may impact Prometheus performance, especially if you have many Chunk Servers and export all chart metrics. Keep in mind that the same information is always available in the MooseFS GUI, so if you don’t need all MooseFS metrics in Prometheus, you can use the GUI to explore data and investigate potential issues as needed.
Configuring Prometheus to Collect Metrics
To enable Prometheus to scrape metrics from MooseFS, you'll need to modify its configuration file (prometheus.yml). Below is a step-by-step guide:
Step 1: Identify the Host and Port for Metrics
In MooseFS, metrics are available on the same port as the GUI, which defaults to 9425. Usually, the GUI is hosted on the same server as the MooseFS Master. Metrics can typically be accessed at:
http://<mfsguihost>:9425/metrics
If the GUI is running on a different port, replace 9425 with the correct port in your Prometheus configuration.
Step 2: Modify the prometheus.yml Configuration File
To configure Prometheus to scrape MooseFS metrics, add a new scrape job under the scrape_configs section of the prometheus.yml file.
A basic configuration example that collects all data sets and graphs from the MooseFS GUI:
scrape_configs:
- job_name: MooseFS_metrics
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
This basic configuration will always attempt to retrieve metrics from the default hostname: mfsmaster. If the MooseFS Master has a different A-record name than mfsmaster, an additional params section must be specified.
Advanced Configuration Example
For more granular control, you can customize the configuration by specifying parameters such as the master host, port, scopes, and charts:
scrape_configs:
- job_name: JOB_NAME
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['GUI_HOST:GUI_PORT']
params:
masterhost: ['MASTER_HOST'] # Default: mfsmaster
masterport: ['MASTER_PORT'] # Default: 9421
scope: ['SCOPE_LIST'] # Default: default scopes ('SIM,SLI,SIG,SIC,SIL,SCS,SMB,SHD')
mastercharts: ['MASTER_CHART_LIST'] # Default: all charts
cscharts: ['CS_CHART_LIST'] # Default: all charts
prefix_whitelist: ['WHITELISTED_PREFIXES_LIST'] # Default: no whitelisted prefixes
prefix_blacklist: ['BLACKLISTED_PREFIXES_LIST'] # Default: no blacklisted prefixes
Parameter Details:
-
GUI_HOST- host, under which GUI is available -
GUI_PORT- port, under which GUI is available onGUI_HOST -
MASTER_HOST- fully qualified domain name of the MooseFS master server (default:mfsmaster) -
MASTER_PORT- Port number for the MooseFS master server (default:9421). -
SCOPE_LIST- A comma-separated list of data sets (scopes) to include in metrics, or usedefaultornone. Supported scopes:SIM,SLI,SIG,SMU,SIC,SIL,SCS,SMB,SHD,SSC,SMO,SQU. Refer tomfscli -hfor more details on-Sparameters. -
MASTER_CHART_LIST- A comma-separated list of master server charts to include in metrics, or useallornone. -
CS_CHART_LIST- A comma-separated list of chunk server charts to include in metrics, or useallornone. -
WHITELISTED_PREFIXES_LIST- A comma-separated list of metrics to include in metrics. It will be treated as a wildcard i.e.mfs_diskswill take all the metrics that begin with that string. -
BLACKLISTED_PREFIXES_LIST- A comma-separated list of metrics to exclude from metrics. It will be treated as a wildcard i.e.mfs_diskswill take all the metrics that begin with that string.noteNote that when using both whitelisted and blacklisted prefixes, the whitelist is applied first, and then the blacklist is enforced on top of it.
Examples
- Retrieves all available MooseFS metrics, which may include hundreds of data points, potentially multiplied by the number of services of a particular type. This can result in thousands of records, so use caution to avoid overloading your Prometheus instance.
# Scrapes all possible MooseFS metrics, many hundreds of them
- job_name: "MooseFS_all_metrics"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
# scope: ['default'] # default: SIM, SLI, SIG, SIC, SIL, SCS, SMB, SHD; available: SIM, SLI, SIG, SMU, SIC, SIL, SCS, SMB, SHD, SSC, SMO, SQU
# mastercharts: ['all'] # default: include all available master server charts, see 'mfscli -SMC' help for a list of available master server charts
# cscharts: ['all'] # default: include all available chunk server charts, see 'mfscli -SCS' help for a list of available chunk server charts
# prefix_whitelist: [] # no need to whitelist anything
# prefix_blacklist: [] # no need to blacklist anything
- Collects only HDD usage metrics (total space, used space, and usage percentage) from all chunk servers, providing a focused view of storage utilization while minimizing data volume.
# Scrape only HDD usage (total, used and used percent) data from all chunk servers
- job_name: "MooseFS_hdd_only"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
scope: ['SHD'] # get only HDD data from mfscli tool
mastercharts: ['none'] # exclude any master server charts
cscharts: ['none'] # exclude any chunk servers charts
prefix_whitelist: ['mfs_disks_total,mfs_disks_used,mfs_disks_used_percent'] # filter metrics to include only those relating disks total, used and used percent
# prefix_blacklist: [] # no need to blacklist anything
- Collects only the total cluster throughput metrics, specifically measuring the amount of data received and sent to/from all client-mounted volumes, providing insights into overall data flow.
# Scrape only total cluster throughput, namely how much data was received and sent from/to all client mounted volumes
- job_name: "MooseFS_throughput"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
scope: ['none'] # exclude any scope, because we use only data from master charts
mastercharts: ['mountbytrcvd,mountbytsent'] # include how much data was was received and sent from/to all client mounted volumes
cscharts: ['none'] # no need for any chunk servers charts
# prefix_whitelist: [] # no need to whitelist anything
# prefix_blacklist: [] # no need to blacklist anything
- Collects resource usage metrics for both master and chunk servers, including CPU utilization, memory consumption, and other performance indicators, providing an overview of system health.
# Scrape master servers usage (CPU, memory, etc) and chunk servers usage (CPU, memory, etc)
- job_name: "MooseFS_servers_usage"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
scope: ['SIM,SIG,SMU'] # include master server usage data
mastercharts: ['none'] # no need for any master server charts
cscharts: ['cpu,mem,usedspace'] # include chunk server usage data (from charts): cpu, memory and hdd used space (excluded chunks marked for removal)
# prefix_whitelist: [] # no need to whitelist anything
prefix_blacklist: ['mfs_info_masters_misc,mfs_info_masters_localtime,mfs_info_masters_pro,mfs_info_masters_last,mfs_info_masters_metadata,mfs_info_general,mfs_info_chunks,mfs_info_memory,mfs_info_licence,mfs_info_chunk_loop,mfs_info_fs_loop'] # exclude many unnecessary mfs_info_* metrics
- Collects disk usage metrics from chunk servers while excluding space occupied by chunks marked for removal, providing a more accurate view of active storage utilization.
# Scrape chunk servers disk usage (but excluding chunks marked for removal)
- job_name: "MooseFS_regular_hdd_usage"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
scope: ['SCS'] # include chunk server usage data
mastercharts: ['none'] # no need for any master server charts
cscharts: ['none'] # no need for any chunk servers charts
prefix_whitelist: ['mfs_chunkservers_hdd'] # include only chunkservers hdd usage data
prefix_blacklist: ['mfs_chunkservers_hdd_removal'] # but exclude part related to 'marked for removal' chunks
- Collects disk usage metrics from chunk servers, excluding space allocated to chunks scheduled for removal, ensuring a clearer representation of actively used storage.
# Scrape chunks health status (how many chunks are endangered, lost, etc)
- job_name: "MooseFS_chunks_health"
scrape_interval: 60s
scrape_timeout: 20s
static_configs:
- targets: ['mfsgui.my.lan:9425']
params:
masterhost: ['mfsmaster.my.lan']
scope: ['SIC'] # include chunks health status data
mastercharts: ['none'] # no need for any master server charts
cscharts: ['none'] # no need for any chunk servers charts
prefix_whitelist: ['mfs_info_chunks_summary_regularchunks'] # include only regular (exclude 'marked for removal') chunks health status data
prefix_blacklist: ['mfs_info_chunks_summary_regularchunks_copies,mfs_info_chunks_summary_regularchunks_ec'] # exclude detailed copies/ec4/ec8 data, leaving only summary for 'Copies and EC chunks'
Step 3: Restart Prometheus
After modifying the configuration file, reload Prometheus to apply the changes. This ensures that Prometheus begins scraping MooseFS metrics as per the new configuration.
Using the metrics
Metric names
MooseFS metrics are a flattened JSON representation of the output from the mfscli -j command, using the relevant data scopes mentioned above.
Metric names follow a structured format:
-
Each metric starts with the prefix
mfs_, indicating that it originates from MooseFS. This makes it easier to filter and explore metrics in Prometheus and reporting tools like Grafana. -
The name is then composed of segments, each separated by an underscore (_). These segments correspond to object names in the JSON structure, ending with the specific leaf key name.
For objects in JSON that have non-numerical values there are metrics with _misc suffix. Such a metric has all non-numerical values that an object has. Those values are in label values with the label name matching their key.
With every set of metrics there are two additional ones at the beginning:
-
mfs_cli_execution_time- indicating the command used to get the metrics and time required to retrieve MooseFS metrics from CLI -
mfs_cgi_info- providing information about current version of the CGI/GUI that provides all the metrics in the first place
You can verify the exact command and view its output by visiting:
http://<mfsguihost>:9425/metrics
For tests and to see the metrics you can get into your Prometheus and the amount of data it will result in you can adjust the above url to generate different sets of metrics. You won’t use this url but you can inspect what you can get. You can build such url in the following way:
http://<mfsguihost>:9425/metrics?masterhost=MASTER_HOST&masterport=MASTER_PORT&scope=SCOPE_LIST&mscharts=MASTER_CHART_LIST&cscharts=CS_CHART_LIST
For example:
http://localhost:9425/metrics?masterhost=my_masterhost&masterport=9421&scope=SIN,SLI&mscharts=ucpu&cscharts=scpu,ucpu
MooseFS CGI metrics on third-party http servers
We typically recommend using the MooseFS CGI server to run the web - based GUI and to expose metrics for Prometheus monitoring. However, in some cases, you may require more advanced features from your HTTP server - such as custom access control, SSL support, or integration with other services.
In such situations, you can use third-party web servers like Apache or Nginx to serve MooseFS CGI scripts. Below, you’ll find configuration examples demonstrating how to run MooseFS CGI scripts using these alternative HTTP servers.
Nginx and mfsgui
You can run MooseFS CGI scripts using Nginx server however, this approach is not recommended, as it may introduce performance degradation.
For GUI access, consider using a reverse proxy for GUI. Doing so will provide unified authentication for both the GUI and metrics endpoints.
However, if you want to run MooseFS CGI scripts using the Nginx server, you’ll need to install the following additional packages:
python3moosefs-pro-guinginxfcgiwrap
First, make sure that the FastCGI service and socket are running by checking their status:
systemctl status fcgiwrap.service
systemctl status fcgiwrap.socket
Once all required software is installed, you can begin configuring the Nginx server. Below is an example of a default Nginx configuration file, typically located at /etc/nginx/sites-available/default.
# MooseFS NGINX CGI GUI configuration
server {
listen 80 default_server;
listen [::]:80 default_server;
root /usr/share/mfscgi/;
index index.html;
server_name mfsgui.my.lan;
location / {
try_files $uri $uri/ =404;
}
location /metrics {
if ($is_args) {
rewrite ^/metrics$ /mfs.cgi?ajax=metrics&$args last;
}
rewrite ^/metrics$ /mfs.cgi?ajax=metrics last;
}
location ~ \.cgi$ {
gzip off;
fastcgi_pass unix:/run/fcgiwrap.socket;
include /etc/nginx/fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
}
}
Apache and mfsgui
You can run MooseFS CGI scripts using Apache server however, this approach is not recommended, as it may introduce performance degradation.
For GUI access, consider using a reverse proxy for GUI. Doing so will provide unified authentication for both the GUI and metrics endpoints.
However if you want to run MooseFS CGI scripts using the Apache server, you’ll need to install the following packages:
python3moosefs-pro-guiapache2
Make sure to enable the required Apache modules for CGI and URL rewriting:
a2enmod rewrite
a2enmod cgi
Below is an example configuration file, typically located at /etc/apache2/sites-available/000-default.conf
# MooseFS Apache2 CGI GUI configuration
<VirtualHost *:80>
ServerName mfsgui.my.lan
ServerAdmin email@domain.name
DocumentRoot /usr/share/mfscgi
<Directory />
Options +ExecCGI
AddHandler cgi-script .cgi
</Directory>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/metrics$
RewriteCond %{QUERY_STRING} (.*)$
RewriteRule ^(.*)$ /mfs.cgi?ajax=metrics&%1 [L,QSA]
ErrorLog ${APACHE_LOG_DIR}/error-moosefs-cgi-monitoring-interface.log
CustomLog ${APACHE_LOG_DIR}/access-moosefs-cgi-monitoring-interface.log combined
</VirtualHost>
After configuring Apache, restart the service to apply the changes:
systemctl restart apache2.service
Once restarted, you should be able to access both the MooseFS metrics and the web GUI from your Apache server:
- Metrics: http://mfsgui.my.lan/metrics
- GUI: http://mfsgui.my.lan/