process-exporter

Prometheus exporter that mines /proc to report on selected processes

1,931

295

1,931

128

View on GitHub

Top Related Projects

cadvisor

18,078

Analyzes resource usage and performance characteristics of running containers.

telegraf

15,712

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.

netdata

74,906

The fastest path to AI-powered full stack observability, even for lean teams.

glances

29,385

Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.

sentry

41,213

Developer-first error tracking and performance monitoring

Quick Overview

Process-exporter is a Prometheus exporter that collects and exposes detailed metrics about running processes on a system. It allows for fine-grained monitoring of specific processes or groups of processes, providing valuable insights into resource usage and performance.

Pros

Highly configurable, allowing users to define custom process groupings and metrics
Provides detailed process-level metrics not available in standard node exporters
Supports both Linux and Windows operating systems
Lightweight and efficient, with minimal impact on system resources

Cons

Requires manual configuration for optimal use, which can be complex for large-scale deployments
Limited documentation for advanced use cases and troubleshooting
May require frequent updates to configuration as processes change or new applications are deployed
Not as widely adopted as some other Prometheus exporters, potentially leading to less community support

Getting Started

Download the latest release from the GitHub repository.
Create a configuration file process-exporter.yaml:

process_names:
  - name: "{{.Comm}}"
    cmdline:
    - '.+'

Run the process-exporter:

./process-exporter -config.path process-exporter.yaml

Configure Prometheus to scrape metrics from process-exporter:

scrape_configs:
  - job_name: 'process-exporter'
    static_configs:
      - targets: ['localhost:9256']

Access metrics at http://localhost:9256/metrics

Competitor Comparisons

node_exporter

12,209

Exporter for machine metrics

Pros of node_exporter

Broader system metrics coverage, including CPU, memory, disk, and network
Official Prometheus project with extensive community support and regular updates
Supports a wide range of operating systems and architectures

Cons of node_exporter

Lacks detailed process-specific metrics
Higher resource usage due to comprehensive metric collection
More complex configuration for custom metrics

Code Comparison

node_exporter:

func (c *cpuCollector) Update(ch chan<- prometheus.Metric) error {
    stats, err := cpu.Get()
    if err != nil {
        return err
    }
    for cpuID, cpuStat := range stats {
        ch <- prometheus.MustNewConstMetric(c.cpu.Desc(), prometheus.GaugeValue, cpuStat.Usage, cpuID)
    }
    return nil
}

process-exporter:

func (p *Proc) GetProcInfo() (ProcInfo, error) {
    stat, err := p.Stat()
    if err != nil {
        return ProcInfo{}, err
    }
    return ProcInfo{
        PID:     p.PID,
        Name:    stat.Comm,
        Cmdline: p.Cmdline(),
    }, nil
}

process-exporter focuses on detailed process-level metrics, while node_exporter provides a broader range of system-wide metrics. process-exporter is more lightweight and specific to process monitoring, whereas node_exporter offers a comprehensive solution for overall system monitoring. The code snippets demonstrate the different approaches: node_exporter collects general CPU metrics, while process-exporter retrieves detailed information about individual processes.

cadvisor

18,078

Analyzes resource usage and performance characteristics of running containers.

Pros of cAdvisor

Provides comprehensive container metrics, including CPU, memory, network, and filesystem usage
Supports multiple container runtimes (Docker, containerd, cri-o)
Offers a built-in web UI for easy visualization of metrics

Cons of cAdvisor

Focuses primarily on container-level metrics, less granular for individual processes
Can be resource-intensive, especially in large-scale environments
May require additional configuration for custom metrics or specific use cases

Code Comparison

process-exporter:

func (p *Proc) GetProcInfo() (procInfo ProcInfo, err error) {
    procInfo.Name = p.Name
    procInfo.Cmdline = p.Cmdline
    procInfo.CmdlineSlice = p.CmdlineSlice
    procInfo.Username = p.Username
    return procInfo, nil
}

cAdvisor:

func (self *containerData) GetStats() (*info.ContainerStats, error) {
    stats, err := self.handler.GetStats()
    if err != nil {
        return nil, err
    }
    return stats, nil
}

Summary

process-exporter focuses on detailed process-level metrics, making it ideal for monitoring specific applications or services. cAdvisor, on the other hand, provides a broader view of container-level metrics, making it more suitable for overall container monitoring in orchestrated environments. The choice between the two depends on the specific monitoring requirements and the infrastructure setup.

telegraf

15,712

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.

Pros of Telegraf

Broader scope: Collects metrics from various systems and services, not just processes
Extensive plugin ecosystem: Supports a wide range of input, output, and processing plugins
Built-in support for multiple output formats and databases

Cons of Telegraf

More complex setup and configuration due to its broader scope
Higher resource usage, especially for large-scale deployments
Steeper learning curve for users who only need process monitoring

Code Comparison

process-exporter:

func (p *Proc) GetProcInfo() (procInfo ProcInfo, err error) {
    procInfo.Name = p.Name
    procInfo.Pid = p.Pid
    procInfo.Ppid = p.Ppid
    return procInfo, nil
}

Telegraf:

func (p *ProcessStats) Gather(acc telegraf.Accumulator) error {
    procs, err := p.getProcesses()
    if err != nil {
        return err
    }
    for _, proc := range procs {
        p.addMetrics(proc, acc)
    }
    return nil
}

Both projects aim to collect process-related metrics, but Telegraf offers a more comprehensive solution for overall system monitoring. process-exporter is more focused and lightweight, making it ideal for specific process monitoring needs. Telegraf's flexibility comes at the cost of increased complexity, while process-exporter provides a simpler, more targeted approach to process monitoring.

netdata

74,906

The fastest path to AI-powered full stack observability, even for lean teams.

Pros of netdata

Comprehensive system monitoring with a wide range of metrics
Real-time, interactive web dashboard for easy visualization
Extensive plugin system for custom data collection

Cons of netdata

Higher resource usage due to its comprehensive nature
Steeper learning curve for configuration and customization
May be overkill for simple process monitoring needs

Code Comparison

netdata:

static void rrdset_done(RRDSET *st) {
    if(unlikely(!st->rrd_memory_mode))
        return;

    RRDDIM *rd;

process-exporter:

func (p *Proc) GetProcInfo() (ProcInfo, error) {
    stat, err := p.Stat()
    if err != nil {
        return ProcInfo{}, err
    }

Key Differences

netdata is a full-featured monitoring solution, while process-exporter focuses specifically on process metrics
netdata provides a built-in web interface, whereas process-exporter exports metrics for consumption by other tools
netdata is written primarily in C, while process-exporter is written in Go
netdata offers a broader range of metrics and plugins, but process-exporter is more lightweight and focused

Use Cases

Choose netdata for comprehensive system monitoring with a user-friendly interface
Opt for process-exporter when you need lightweight, Prometheus-compatible process metrics collection

glances

29,385

Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.

Pros of Glances

More comprehensive system monitoring, including CPU, memory, disk, network, and processes
Cross-platform support (Linux, macOS, Windows)
Web-based interface and REST API for easy integration

Cons of Glances

Higher resource usage due to its comprehensive monitoring capabilities
May be overkill for users only interested in process-specific metrics
Steeper learning curve due to its extensive feature set

Code Comparison

Glances (Python):

from glances_api import Glances

glances = Glances()
cpu_percent = glances.cpu.percent
memory_percent = glances.mem.percent

Process-exporter (Go):

import "github.com/ncabatoff/process-exporter/proc"

processes, err := proc.AllProcs()
for _, p := range processes {
    // Process metrics available here
}

Summary

Glances is a more feature-rich system monitoring tool with a broader scope, while Process-exporter focuses specifically on exporting process metrics for Prometheus. Glances offers a user-friendly interface and cross-platform support but may consume more resources. Process-exporter is lightweight and tailored for Prometheus integration but has a narrower focus on process metrics. The choice between the two depends on the specific monitoring requirements and infrastructure setup of the user.

sentry

41,213

Developer-first error tracking and performance monitoring

Pros of Sentry

Comprehensive error tracking and monitoring solution for multiple programming languages and platforms
Robust features including real-time alerts, release tracking, and performance monitoring
Large and active community with extensive documentation and integrations

Cons of Sentry

More complex setup and configuration compared to process-exporter
Higher resource usage due to its extensive feature set
Potential for information overload with numerous alerts and notifications

Code Comparison

process-exporter (Go):

func (p *Proc) GetProcInfo() (ProcInfo, error) {
    stat, err := p.Stat()
    if err != nil {
        return ProcInfo{}, err
    }
    return ProcInfo{
        PID:  p.PID,
        Name: stat.Comm,
    }, nil
}

Sentry (Python):

def capture_exception(self, exc_info=None, **kwargs):
    if exc_info is None:
        exc_info = sys.exc_info()
    return self.captureException(
        exc_info=exc_info,
        **kwargs
    )

While process-exporter focuses on exporting process metrics for Prometheus, Sentry provides a more comprehensive error tracking and monitoring solution. The code snippets highlight their different purposes, with process-exporter handling process information and Sentry capturing exceptions for analysis.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

process-exporter

Prometheus exporter that mines /proc to report on selected processes.

Some apps are impractical to instrument directly, either because you don't control the code or they're written in a language that isn't easy to instrument with Prometheus. We must instead resort to mining /proc.

Installation

Either grab a package for your OS from the Releases page, or install via docker.

Running

Usage:

  process-exporter [options] -config.path filename.yml

or via docker:

  docker run -d --rm -p 9256:9256 --privileged -v /proc:/host/proc -v `pwd`:/config ncabatoff/process-exporter --procfs /host/proc -config.path /config/filename.yml

Important options (run process-exporter --help for full list):

-children (default:true) makes it so that any process that otherwise isn't part of its own group becomes part of the first group found (if any) when walking the process tree upwards. In other words, resource usage of subprocesses is added to their parent's usage unless the subprocess identifies as a different group name.

-threads (default:true) means that metrics will be broken down by thread name as well as group name.

-recheck (default:false) means that on each scrape the process names are re-evaluated. This is disabled by default as an optimization, but since processes can choose to change their names, this may result in a process falling into the wrong group if we happen to see it for the first time before it's assumed its proper name. You can use -recheck-with-time-limit to enable this feature only for a specific duration after process starts.

-procnames is intended as a quick alternative to using a config file. Details in the following section.

-remove-empty-groups (default:false) forget process groups with no processes. This is particularly useful if you have some process groups that you expect will never return (e.g. if you have process groups named "scan-", and once the scan is completed no more process will ever run for that scan again).

To disable any of these options, use the -option=false.

Configuration and group naming

To select and group the processes to monitor, either provide command-line arguments or use a YAML configuration file.

The recommended option is to use a config file via -config.path, but for convenience and backwards compatibility the -procnames/-namemapping options exist as an alternative.

Using a config file

The general format of the -config.path YAML file is a top-level process_names section, containing a list of name matchers:

process_names:
  - matcher1
  - matcher2
  ...
  - matcherN

The default config shipped with the deb/rpm packages is:

process_names:
  - name: "{{.Comm}}"
    cmdline:
    - '.+'

A process may only belong to one group: even if multiple items would match, the first one listed in the file wins.

(Side note: to avoid confusion with the cmdline YAML element, we'll refer to the command-line arguments of a process /proc/<pid>/cmdline as the array argv[].)

Using a config file: group name

Each item in process_names gives a recipe for identifying and naming processes. The optional name tag defines a template to use to name matching processes; if not specified, name defaults to {{.ExeBase}}.

Template variables available:

{{.Comm}} contains the basename of the original executable, i.e. 2nd field in /proc/<pid>/stat
{{.ExeBase}} contains the basename of the executable
{{.ExeFull}} contains the fully qualified path of the executable
{{.Username}} contains the username of the effective user
{{.Matches}} map contains all the matches resulting from applying cmdline regexps
{{.PID}} contains the PID of the process. Note that using PID means the group will only contain a single process.
{{.StartTime}} contains the start time of the process. This can be useful in conjunction with PID because PIDs get reused over time.
{{.Cgroups}} contains (if supported) the cgroups of the process (/proc/self/cgroup). This is particularly useful for identifying to which container a process belongs.

Using PID or StartTime is discouraged: this is almost never what you want, and is likely to result in high cardinality metrics which Prometheus will have trouble with.

Using a config file: process selectors

Each item in process_names must contain one or more selectors (comm, exe or cmdline); if more than one selector is present, they must all match. Each selector is a list of strings to match against a process's comm, argv[0], or in the case of cmdline, a regexp to apply to the command line. The cmdline regexp uses the Go syntax.

For comm and exe, the list of strings is an OR, meaning any process matching any of the strings will be added to the item's group.

For cmdline, the list of regexes is an AND, meaning they all must match. Any capturing groups in a regexp must use the ?P<name> option to assign a name to the capture, which is used to populate .Matches.

Performance tip: give an exe or comm clause in addition to any cmdline clause, so you avoid executing the regexp when the executable name doesn't match.


process_names:
  # comm is the second field of /proc/<pid>/stat minus parens.
  # It is the base executable name, truncated at 15 chars.
  # It cannot be modified by the program, unlike exe.
  - comm:
    - bash

  # exe is argv[0]. If no slashes, only basename of argv[0] need match.
  # If exe contains slashes, argv[0] must match exactly.
  - exe:
    - postgres
    - /usr/local/bin/prometheus

  # cmdline is a list of regexps applied to argv.
  # Each must match, and any captures are added to the .Matches map.
  - name: "{{.ExeFull}}:{{.Matches.Cfgfile}}"
    exe:
    - /usr/local/bin/process-exporter
    cmdline:
    - -config.path\s+(?P<Cfgfile>\S+)

Here's the config I use on my home machine:


process_names:
  - comm:
    - chromium-browse
    - bash
    - prometheus
    - gvim
  - exe:
    - /sbin/upstart
    cmdline:
    - --user
    name: upstart:-user

Using -procnames/-namemapping instead of config.path

Every name in the procnames list becomes a process group. The default name of a process is the value found in the second field of /proc//stat ("comm"), which is truncated at 15 chars. Usually this is the same as the name of the executable.

If -namemapping isn't provided, every process with a comm value present in -procnames is assigned to a group based on that name, and any other processes are ignored.

The -namemapping option is a comma-separated list of alternating name,regexp values. It allows assigning a name to a process based on a combination of the process name and command line. For example, using

-namemapping "python2,([^/]+).py,java,-jar\s+([^/]+).jar"

will make it so that each different python2 and java -jar invocation will be tracked with distinct metrics. Processes whose remapped name is absent from the procnames list will be ignored. On a Ubuntu Xenian machine being used as a workstation, here's a good way of tracking resource usage for a few different key user apps:

process-exporter -namemapping "upstart,(--user)"
-procnames chromium-browse,bash,gvim,prometheus,process-exporter,upstart:-user

Since upstart --user is the parent process of the X11 session, this will make all apps started by the user fall into the group named "upstart:-user", unless they're one of the others named explicitly with -procnames, like gvim.

Group Metrics

There's no meaningful way to name a process that will only ever name a single process, so process-exporter assumes that every metric will be attached to a group of processes - not a process group in the technical sense, just one or more processes that meet a configuration's specification of what should be monitored and how to name it.

All these metrics start with namedprocess_namegroup_ and have at minimum the label groupname.

num_procs gauge

Number of processes in this group.

cpu_seconds_total counter

CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time. This is similar to the node_exporter's node_cpu_seconds_total.

read_bytes_total counter

Bytes read based on /proc/[pid]/io field read_bytes. The man page says

Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems.

but I would take it with a grain of salt.

As /proc/[pid]/io are set by the kernel as read only to the process' user (see #137), to get these values you should run process-exporter either as that user or as root. Otherwise, we can't read these values and you'll get a constant 0 in the metric.

write_bytes_total counter

Bytes written based on /proc/[pid]/io field write_bytes. As with read_bytes, somewhat dubious. May be useful for isolating which processes are doing the most I/O, but probably not measuring just how much I/O is happening.

major_page_faults_total counter

Number of major page faults based on /proc/[pid]/stat field majflt(12).

minor_page_faults_total counter

Number of minor page faults based on /proc/[pid]/stat field minflt(10).

context_switches_total counter

Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches and nonvoluntary_ctxt_switches. The extra label ctxswitchtype can have two values: voluntary and nonvoluntary.

memory_bytes gauge

Number of bytes of memory used. The extra label memtype can have three values:

resident: Field rss(24) from /proc/[pid]/stat, whose doc says:

This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.

virtual: Field vsize(23) from /proc/[pid]/stat, virtual memory size.

swapped: Field VmSwap from /proc/[pid]/status, translated from KB to bytes.

If gathering smaps file is enabled, two additional values for memtype are added:

proportionalResident: Sum of "Pss" fields from /proc/[pid]/smaps, whose doc says:

The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it.

proportionalSwapped: Sum of "SwapPss" fields from /proc/[pid]/smaps

open_filedesc gauge

Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd.

worst_fd_ratio gauge

Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits.

Normally Prometheus metrics ought to be as "basic" as possible (i.e. the raw values rather than a derived ratio), but we use a ratio here because nothing else makes sense. Suppose there are 10 procs in a given group, each with a soft limit of 4096, and one of them has 4000 open fds and the others all have 40, their total fdcount is 4360 and total soft limit is 40960, so the ratio is 1:10, but in fact one of the procs is about to run out of fds. With worst_fd_ratio we're able to know this: in the above example it would be 0.97, rather than the 0.10 you'd see if you computed sum(open_filedesc) / sum(limit_filedesc).

oldest_start_time_seconds gauge

Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch.

num_threads gauge

Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat.

states gauge

Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat.

The extra label state can have these values: Running, Sleeping, Waiting, Zombie, Other.

Group Thread Metrics

Since publishing thread metrics adds a lot of overhead, use the -threads command-line argument to disable them, if necessary.

All these metrics start with namedprocess_namegroup_ and have at minimum the labels groupname and threadname. threadname is field comm(2) from /proc/[pid]/stat. Just as groupname breaks the set of processes down into groups, threadname breaks a given process group down into subgroups.

thread_count gauge

Number of threads in this thread subgroup.

thread_cpu_seconds_total counter

Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down per-thread subgroup. Unlike cpu_user_seconds_total/cpu_system_seconds_total, the label cpumode is used to distinguish between user and system time.

thread_io_bytes_total counter

Same as read_bytes_total and write_bytes_total, but broken down per-thread subgroup. Unlike read_bytes_total/write_bytes_total, the label iomode is used to distinguish between read and write bytes.

thread_major_page_faults_total counter

Same as major_page_faults_total, but broken down per-thread subgroup.

thread_minor_page_faults_total counter

Same as minor_page_faults_total, but broken down per-thread subgroup.

thread_context_switches_total counter

Same as context_switches_total, but broken down per-thread subgroup.

Instrumentation cost

process-exporter will consume CPU in proportion to the number of processes in the system and the rate at which new ones are created. The most expensive parts - applying regexps and executing templates - are only applied once per process seen, unless the command-line option -recheck is provided.

If you have mostly long-running processes process-exporter overhead should be minimal: each time a scrape occurs, it will parse of /proc/$pid/stat and /proc/$pid/cmdline for every process being monitored and add a few numbers.

Dashboards

An example Grafana dashboard to view the metrics is available at https://grafana.net/dashboards/249

Building

Requires Go 1.21 (at least) installed.

make

Exposing metrics through HTTPS

web-config.yml

# Minimal TLS configuration example. Additionally, a certificate and a key file
# are needed.
tls_server_config:
  cert_file: server.crt
  key_file: server.key

Running

$ ./process-exporter -web.config.file web-config.yml &
$ curl -sk https://localhost:9256/metrics | grep process

# HELP namedprocess_scrape_errors general scrape errors: no proc metrics collected during a cycle
# TYPE namedprocess_scrape_errors counter
namedprocess_scrape_errors 0
# HELP namedprocess_scrape_partial_errors incremented each time a tracked proc's metrics collection fails partially, e.g. unreadable I/O stats
# TYPE namedprocess_scrape_partial_errors counter
namedprocess_scrape_partial_errors 0
# HELP namedprocess_scrape_procread_errors incremented each time a proc's metrics collection fails
# TYPE namedprocess_scrape_procread_errors counter
namedprocess_scrape_procread_errors 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.21
# HELP process_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which process_exporter was built.
# TYPE process_exporter_build_info gauge
process_exporter_build_info{branch="",goversion="go1.17.3",revision="",version=""} 1
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10

For further information about TLS configuration, please visit: exporter-toolkit

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot