Prometheus Metrics for MooseFS

This guide explains how to integrate MooseFS with Prometheus for effective system monitoring. It covers how to configure Prometheus to collect metrics from MooseFS, provides an overview of the structure of these metrics, and explains how to customize metric collection to optimize monitoring performance.

Prometheus is an open-source monitoring and alerting toolkit designed to collect and analyze system metrics in real time. All the data available in the MooseFS GUI can also be accessed as Prometheus metrics, making it easier to interpret since the format aligns with what is already familiar in the GUI. Additionally, most metrics include descriptions available as tooltips in MooseFS GUI, Prometheus, and Grafana.

MooseFS exports a large number of metrics, and this guide outlines how to customize the selection to suit specific monitoring needs, helping to reduce unnecessary load on your monitoring system while ensuring critical data is captured.

Configuring Data for Metrics Export

The metrics are sourced from the MooseFS command-line interface (CLI) using the mfscli command, as well as from MooseFS charts for Master Servers and Chunk Servers available in GUI. Some metrics may be available in both the CLI output and the charts, so it’s important to avoid exporting redundant data to prevent unnecessary load on your monitoring system and ensure efficient performance.

Metrics from Command Line Interface (CLI) Data Sets

The following Data Sets are currently available, corresponding to the data presented via the MooseFS GUI and CLI: SIM,SLI,SIG,SMU,SIC,SIL,SCS,SMB,SHD,SSC,SMO,SQU. For a description of these data sets, refer to the output of mfscli -h or see the list below. The remaining data sets available in the CLI are not allowed for the reasons specified next to them.

SIN : show full Master info - NOT allowed because includes possibly too long list of missing chunks/files (SMF)
SIM : show only Masters states; metrics mfs_info_masters_*
SLI : show only licence info; metrics mfs_info_licence_*
SIG : show only general Master (Leader) info; metrics mfs_info_general_*
SMU : show only Master memory usage; metrics metrics mfs_info_memory_*
SIC : show only chunks info (target/current redundancy level matrices); metrics mfs_info_chunks_*
SIL : show only loop info (with messages); metrics mfs_info_*_loop_*
SMF : show only missing chunks/files - NOT allowed because possibly too long list of missing chunks/files
SCS : show connected Chunkservers; metrics mfs_chunkservers_*
SMB : show connected metadata backup servers
SHD : show hdd data; metrics mfs_disks_*
SEX : show exports - NOT allowed because not much interesting numeric data here
SMS : show active mounts - NOT allowed because not much interesting numeric data here
SRS : show resources - NOT allowed because possibly too long list of open files and locks
SSC : show storage classes; metrics mfs_storage_classes_*
SPA : show patterns override data - NOT allowed because not much interesting numeric data here
SOF : show only open files - NOT allowed because possibly too long list of open files
SAL : show only acquired locks - NOT allowed because possibly too long list of acquired locks
SMO : show operation counters; metrics mfs_operations_*
SQU : show quota info; metrics mfs_quotas_*
SMC : show Master charts data - NOT allowed as scope because covered with cscharts
SCC : show Chunkserver charts data - NOT allowed as scope because covered with mastercharts

Metrics from Charts

MooseFS metrics also include data from the charts for Master Servers and Chunk Servers, which are available in the GUI. Metrics related to Master Servers and Chunk Servers contain the infixes mastercharts and cscharts, respectively.

Default set of metrics

By default, if no specific metrics are selected, all available mfscli Data Sets (SIM,SLI,SIG,SIC,SIL,SCS,SMB,SHD) and all chart metrics for Master Servers and Chunk Servers will be enabled. This allows you to explore the full range of available data and decide which metrics are essential for your monitoring setup.

warning

MooseFS can generate a large number of metrics, which may impact Prometheus performance, especially if you have many Chunk Servers and export all chart metrics. Keep in mind that the same information is always available in the MooseFS GUI, so if you don’t need all MooseFS metrics in Prometheus, you can use the GUI to explore data and investigate potential issues as needed.

Configuring Prometheus to Collect Metrics

To enable Prometheus to scrape metrics from MooseFS, you'll need to modify its configuration file (prometheus.yml). Below is a step-by-step guide:

Step 1: Identify the Host and Port for Metrics

In MooseFS, metrics are available on the same port as the GUI, which defaults to 9425. Usually, the GUI is hosted on the same server as the MooseFS Master. Metrics can typically be accessed at:

http://<mfsguihost>:9425/metrics

If the GUI is running on a different port, replace 9425 with the correct port in your Prometheus configuration.

Step 2: Modify the prometheus.yml Configuration File

To configure Prometheus to scrape MooseFS metrics, add a new scrape job under the scrape_configs section of the prometheus.yml file.

A basic configuration example that collects all data sets and graphs from the MooseFS GUI:

scrape_configs:
  - job_name: MooseFS_metrics
    scrape_interval: 60s
    scrape_timeout: 20s
    static_configs:
      - targets: ['mfsgui.my.lan:9425']

This basic configuration will always attempt to retrieve metrics from the default hostname: mfsmaster. If the MooseFS Master has a different A-record name than mfsmaster, an additional params section must be specified.

Advanced Configuration Example

For more granular control, you can customize the configuration by specifying parameters such as the master host, port, scopes, and charts:

scrape_configs:
  - job_name: JOB_NAME
    scrape_interval: 60s
    scrape_timeout: 20s
    static_configs:
      - targets: ['GUI_HOST:GUI_PORT']
    params:
      masterhost: ['MASTER_HOST']       # Default: mfsmaster
      masterport: ['MASTER_PORT']       # Default: 9421
      scope: ['SCOPE_LIST']             # Default: default scopes ('SIM,SLI,SIG,SIC,SIL,SCS,SMB,SHD')
      mastercharts: ['MASTER_CHART_LIST'] # Default: all charts
      cscharts: ['CS_CHART_LIST']       # Default: all charts
      prefix_whitelist: ['WHITELISTED_PREFIXES_LIST']       # Default: no whitelisted prefixes
      prefix_blacklist: ['BLACKLISTED_PREFIXES_LIST']       # Default: no blacklisted prefixes

Parameter Details:

GUI_HOST - host, under which GUI is available
GUI_PORT - port, under which GUI is available on GUI_HOST
MASTER_HOST - fully qualified domain name of the MooseFS master server (default: mfsmaster)
MASTER_PORT - Port number for the MooseFS master server (default: 9421).
SCOPE_LIST - A comma-separated list of data sets (scopes) to include in metrics, or use default or none. Supported scopes: SIM,SLI,SIG,SMU,SIC,SIL,SCS,SMB,SHD,SSC,SMO,SQU. Refer to mfscli -h for more details on -S parameters.
MASTER_CHART_LIST - A comma-separated list of master server charts to include in metrics, or use all or none.
CS_CHART_LIST - A comma-separated list of chunk server charts to include in metrics, or use all or none.
WHITELISTED_PREFIXES_LIST - A comma-separated list of metrics to include in metrics. It will be treated as a wildcard i.e. mfs_disks will take all the metrics that begin with that string.
BLACKLISTED_PREFIXES_LIST - A comma-separated list of metrics to exclude from metrics. It will be treated as a wildcard i.e. mfs_disks will take all the metrics that begin with that string.

note
Note that when using both whitelisted and blacklisted prefixes, the whitelist is applied first, and then the blacklist is enforced on top of it.

Examples

Retrieves all available MooseFS metrics, which may include hundreds of data points, potentially multiplied by the number of services of a particular type. This can result in thousands of records, so use caution to avoid overloading your Prometheus instance.

# Scrapes all possible MooseFS metrics, many hundreds of them
- job_name: "MooseFS_all_metrics"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    # scope: ['default'] # default: SIM, SLI, SIG, SIC, SIL, SCS, SMB, SHD; available: SIM, SLI, SIG, SMU, SIC, SIL, SCS, SMB, SHD, SSC, SMO, SQU
    # mastercharts: ['all'] # default: include all available master server charts, see 'mfscli -SMC' help for a list of available master server charts
    # cscharts: ['all'] # default: include all available chunk server charts, see 'mfscli -SCS' help for a list of available chunk server charts
    # prefix_whitelist: [] # no need to whitelist anything
    # prefix_blacklist: [] # no need to blacklist anything

Collects only HDD usage metrics (total space, used space, and usage percentage) from all chunk servers, providing a focused view of storage utilization while minimizing data volume.

# Scrape only HDD usage (total, used and used percent) data from all chunk servers
- job_name: "MooseFS_hdd_only"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    scope: ['SHD'] # get only HDD data from mfscli tool
    mastercharts: ['none'] # exclude any master server charts
    cscharts: ['none'] # exclude any chunk servers charts
    prefix_whitelist: ['mfs_disks_total,mfs_disks_used,mfs_disks_used_percent'] # filter metrics to include only those relating disks total, used and used percent
    # prefix_blacklist: [] # no need to blacklist anything

Collects only the total cluster throughput metrics, specifically measuring the amount of data received and sent to/from all client-mounted volumes, providing insights into overall data flow.

# Scrape only total cluster throughput, namely how much data was received and sent from/to all client mounted volumes
- job_name: "MooseFS_throughput"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    scope: ['none'] # exclude any scope, because we use only data from master charts
    mastercharts: ['mountbytrcvd,mountbytsent'] # include how much data was was received and sent from/to all client mounted volumes
    cscharts: ['none'] # no need for any chunk servers charts
    # prefix_whitelist: [] # no need to whitelist anything
    # prefix_blacklist: [] # no need to blacklist anything

Collects resource usage metrics for both master and chunk servers, including CPU utilization, memory consumption, and other performance indicators, providing an overview of system health.

# Scrape master servers usage (CPU, memory, etc) and chunk servers usage (CPU, memory, etc)
- job_name: "MooseFS_servers_usage"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    scope: ['SIM,SIG,SMU'] # include master server usage data
    mastercharts: ['none'] # no need for any master server charts
    cscharts: ['cpu,mem,usedspace'] # include chunk server usage data (from charts): cpu, memory and hdd used space (excluded chunks marked for removal)
    # prefix_whitelist: [] # no need to whitelist anything
    prefix_blacklist: ['mfs_info_masters_misc,mfs_info_masters_localtime,mfs_info_masters_pro,mfs_info_masters_last,mfs_info_masters_metadata,mfs_info_general,mfs_info_chunks,mfs_info_memory,mfs_info_licence,mfs_info_chunk_loop,mfs_info_fs_loop'] # exclude many unnecessary mfs_info_* metrics

Collects disk usage metrics from chunk servers while excluding space occupied by chunks marked for removal, providing a more accurate view of active storage utilization.

# Scrape chunk servers disk usage (but excluding chunks marked for removal)
- job_name: "MooseFS_regular_hdd_usage"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    scope: ['SCS'] # include chunk server usage data
    mastercharts: ['none'] # no need for any master server charts
    cscharts: ['none'] # no need for any chunk servers charts
    prefix_whitelist: ['mfs_chunkservers_hdd'] # include only chunkservers hdd usage data
    prefix_blacklist: ['mfs_chunkservers_hdd_removal'] # but exclude part related to 'marked for removal' chunks

Collects disk usage metrics from chunk servers, excluding space allocated to chunks scheduled for removal, ensuring a clearer representation of actively used storage.

# Scrape chunks health status (how many chunks are endangered, lost, etc)
- job_name: "MooseFS_chunks_health"
  scrape_interval: 60s
  scrape_timeout: 20s
  static_configs:
    - targets: ['mfsgui.my.lan:9425']
  params:
    masterhost: ['mfsmaster.my.lan']
    scope: ['SIC'] # include chunks health status data
    mastercharts: ['none'] # no need for any master server charts
    cscharts: ['none'] # no need for any chunk servers charts
    prefix_whitelist: ['mfs_info_chunks_summary_regularchunks'] # include only regular (exclude 'marked for removal') chunks health status data
    prefix_blacklist: ['mfs_info_chunks_summary_regularchunks_copies,mfs_info_chunks_summary_regularchunks_ec'] # exclude detailed copies/ec4/ec8 data, leaving only summary for 'Copies and EC chunks'

Step 3: Restart Prometheus

After modifying the configuration file, reload Prometheus to apply the changes. This ensures that Prometheus begins scraping MooseFS metrics as per the new configuration.

Using the metrics

Metric names

MooseFS metrics are a flattened JSON representation of the output from the mfscli -j command, using the relevant data scopes mentioned above.

Metric names follow a structured format:

Each metric starts with the prefix mfs_, indicating that it originates from MooseFS. This makes it easier to filter and explore metrics in Prometheus and reporting tools like Grafana.
The name is then composed of segments, each separated by an underscore (_). These segments correspond to object names in the JSON structure, ending with the specific leaf key name.

For objects in JSON that have non-numerical values there are metrics with _misc suffix. Such a metric has all non-numerical values that an object has. Those values are in label values with the label name matching their key.

With every set of metrics there are two additional ones at the beginning:

mfs_cli_execution_time - indicating the command used to get the metrics and time required to retrieve MooseFS metrics from CLI
mfs_cgi_info - providing information about current version of the CGI/GUI that provides all the metrics in the first place

You can verify the exact command and view its output by visiting:

http://<mfsguihost>:9425/metrics

For tests and to see the metrics you can get into your Prometheus and the amount of data it will result in you can adjust the above url to generate different sets of metrics. You won’t use this url but you can inspect what you can get. You can build such url in the following way:

http://<mfsguihost>:9425/metrics?masterhost=MASTER_HOST&masterport=MASTER_PORT&scope=SCOPE_LIST&mscharts=MASTER_CHART_LIST&cscharts=CS_CHART_LIST

For example:

http://localhost:9425/metrics?masterhost=my_masterhost&masterport=9421&scope=SIN,SLI&mscharts=ucpu&cscharts=scpu,ucpu

MooseFS CGI metrics on third-party http servers

We typically recommend using the MooseFS CGI server to run the web - based GUI and to expose metrics for Prometheus monitoring. However, in some cases, you may require more advanced features from your HTTP server - such as custom access control, SSL support, or integration with other services.

In such situations, you can use third-party web servers like Apache or Nginx to serve MooseFS CGI scripts. Below, you’ll find configuration examples demonstrating how to run MooseFS CGI scripts using these alternative HTTP servers.

Nginx and `mfsgui`

You can run MooseFS CGI scripts using Nginx server however, this approach is not recommended, as it may introduce performance degradation.

For GUI access, consider using a reverse proxy for GUI. Doing so will provide unified authentication for both the GUI and metrics endpoints.

However, if you want to run MooseFS CGI scripts using the Nginx server, you’ll need to install the following additional packages:

python3
moosefs-pro-gui
nginx
fcgiwrap

First, make sure that the FastCGI service and socket are running by checking their status:

systemctl status fcgiwrap.service  
systemctl status fcgiwrap.socket

Once all required software is installed, you can begin configuring the Nginx server. Below is an example of a default Nginx configuration file, typically located at /etc/nginx/sites-available/default.

# MooseFS NGINX CGI GUI configuration
server {
  listen 80 default_server;
  listen [::]:80 default_server;

  root /usr/share/mfscgi/;
  index index.html;
  server_name mfsgui.my.lan;
  
  location / {
    try_files $uri $uri/ =404;
  }
  location /metrics {
    if ($is_args) {
      rewrite ^/metrics$ /mfs.cgi?ajax=metrics&$args last;
    }
    rewrite ^/metrics$ /mfs.cgi?ajax=metrics last;
  }
  location ~ \.cgi$ {
    gzip off;
    fastcgi_pass  unix:/run/fcgiwrap.socket;
    include /etc/nginx/fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    }
}

Apache and `mfsgui`

You can run MooseFS CGI scripts using Apache server however, this approach is not recommended, as it may introduce performance degradation.

For GUI access, consider using a reverse proxy for GUI. Doing so will provide unified authentication for both the GUI and metrics endpoints.

However if you want to run MooseFS CGI scripts using the Apache server, you’ll need to install the following packages:

python3
moosefs-pro-gui
apache2

Make sure to enable the required Apache modules for CGI and URL rewriting:

a2enmod rewrite  
a2enmod cgi

Below is an example configuration file, typically located at /etc/apache2/sites-available/000-default.conf

# MooseFS Apache2 CGI GUI configuration
<VirtualHost *:80>
  ServerName mfsgui.my.lan
  ServerAdmin email@domain.name

  DocumentRoot /usr/share/mfscgi
  <Directory />
    Options +ExecCGI
    AddHandler cgi-script .cgi
  </Directory>

  RewriteEngine On
  RewriteCond %{REQUEST_URI} ^/metrics$
  RewriteCond %{QUERY_STRING} (.*)$
  RewriteRule ^(.*)$ /mfs.cgi?ajax=metrics&%1 [L,QSA]

  ErrorLog ${APACHE_LOG_DIR}/error-moosefs-cgi-monitoring-interface.log
  CustomLog ${APACHE_LOG_DIR}/access-moosefs-cgi-monitoring-interface.log combined
</VirtualHost>

After configuring Apache, restart the service to apply the changes:

systemctl restart apache2.service

Once restarted, you should be able to access both the MooseFS metrics and the web GUI from your Apache server:

Metrics: http://mfsgui.my.lan/metrics
GUI: http://mfsgui.my.lan/

Configuring Data for Metrics Export​

Metrics from Command Line Interface (CLI) Data Sets​

Metrics from Charts​

Default set of metrics​

Configuring Prometheus to Collect Metrics​

Step 1: Identify the Host and Port for Metrics​

Step 2: Modify the prometheus.yml Configuration File​

Advanced Configuration Example​

Examples​

Step 3: Restart Prometheus​

Using the metrics​

Metric names​

MooseFS CGI metrics on third-party http servers​

Nginx and mfsgui​

Apache and mfsgui​