process-exporter
Prometheus exporter that mines /proc to report on selected processes
Top Related Projects
Exporter for machine metrics
Analyzes resource usage and performance characteristics of running containers.
Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
Developer-first error tracking and performance monitoring
Quick Overview
Process-exporter is a Prometheus exporter that collects and exposes detailed metrics about running processes on a system. It allows for fine-grained monitoring of specific processes or groups of processes, providing valuable insights into resource usage and performance.
Pros
- Highly configurable, allowing users to define custom process groupings and metrics
- Provides detailed process-level metrics not available in standard node exporters
- Supports both Linux and Windows operating systems
- Lightweight and efficient, with minimal impact on system resources
Cons
- Requires manual configuration for optimal use, which can be complex for large-scale deployments
- Limited documentation for advanced use cases and troubleshooting
- May require frequent updates to configuration as processes change or new applications are deployed
- Not as widely adopted as some other Prometheus exporters, potentially leading to less community support
Getting Started
- Download the latest release from the GitHub repository.
- Create a configuration file
process-exporter.yaml
:
process_names:
- name: "{{.Comm}}"
cmdline:
- '.+'
- Run the process-exporter:
./process-exporter -config.path process-exporter.yaml
- Configure Prometheus to scrape metrics from process-exporter:
scrape_configs:
- job_name: 'process-exporter'
static_configs:
- targets: ['localhost:9256']
- Access metrics at
http://localhost:9256/metrics
Competitor Comparisons
Exporter for machine metrics
Pros of node_exporter
- Broader system metrics coverage, including CPU, memory, disk, and network
- Official Prometheus project with extensive community support and regular updates
- Supports a wide range of operating systems and architectures
Cons of node_exporter
- Lacks detailed process-specific metrics
- Higher resource usage due to comprehensive metric collection
- More complex configuration for custom metrics
Code Comparison
node_exporter:
func (c *cpuCollector) Update(ch chan<- prometheus.Metric) error {
stats, err := cpu.Get()
if err != nil {
return err
}
for cpuID, cpuStat := range stats {
ch <- prometheus.MustNewConstMetric(c.cpu.Desc(), prometheus.GaugeValue, cpuStat.Usage, cpuID)
}
return nil
}
process-exporter:
func (p *Proc) GetProcInfo() (ProcInfo, error) {
stat, err := p.Stat()
if err != nil {
return ProcInfo{}, err
}
return ProcInfo{
PID: p.PID,
Name: stat.Comm,
Cmdline: p.Cmdline(),
}, nil
}
process-exporter focuses on detailed process-level metrics, while node_exporter provides a broader range of system-wide metrics. process-exporter is more lightweight and specific to process monitoring, whereas node_exporter offers a comprehensive solution for overall system monitoring. The code snippets demonstrate the different approaches: node_exporter collects general CPU metrics, while process-exporter retrieves detailed information about individual processes.
Analyzes resource usage and performance characteristics of running containers.
Pros of cAdvisor
- Provides comprehensive container metrics, including CPU, memory, network, and filesystem usage
- Supports multiple container runtimes (Docker, containerd, cri-o)
- Offers a built-in web UI for easy visualization of metrics
Cons of cAdvisor
- Focuses primarily on container-level metrics, less granular for individual processes
- Can be resource-intensive, especially in large-scale environments
- May require additional configuration for custom metrics or specific use cases
Code Comparison
process-exporter:
func (p *Proc) GetProcInfo() (procInfo ProcInfo, err error) {
procInfo.Name = p.Name
procInfo.Cmdline = p.Cmdline
procInfo.CmdlineSlice = p.CmdlineSlice
procInfo.Username = p.Username
return procInfo, nil
}
cAdvisor:
func (self *containerData) GetStats() (*info.ContainerStats, error) {
stats, err := self.handler.GetStats()
if err != nil {
return nil, err
}
return stats, nil
}
Summary
process-exporter focuses on detailed process-level metrics, making it ideal for monitoring specific applications or services. cAdvisor, on the other hand, provides a broader view of container-level metrics, making it more suitable for overall container monitoring in orchestrated environments. The choice between the two depends on the specific monitoring requirements and the infrastructure setup.
Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Pros of Telegraf
- Broader scope: Collects metrics from various systems and services, not just processes
- Extensive plugin ecosystem: Supports a wide range of input, output, and processing plugins
- Built-in support for multiple output formats and databases
Cons of Telegraf
- More complex setup and configuration due to its broader scope
- Higher resource usage, especially for large-scale deployments
- Steeper learning curve for users who only need process monitoring
Code Comparison
process-exporter:
func (p *Proc) GetProcInfo() (procInfo ProcInfo, err error) {
procInfo.Name = p.Name
procInfo.Pid = p.Pid
procInfo.Ppid = p.Ppid
return procInfo, nil
}
Telegraf:
func (p *ProcessStats) Gather(acc telegraf.Accumulator) error {
procs, err := p.getProcesses()
if err != nil {
return err
}
for _, proc := range procs {
p.addMetrics(proc, acc)
}
return nil
}
Both projects aim to collect process-related metrics, but Telegraf offers a more comprehensive solution for overall system monitoring. process-exporter is more focused and lightweight, making it ideal for specific process monitoring needs. Telegraf's flexibility comes at the cost of increased complexity, while process-exporter provides a simpler, more targeted approach to process monitoring.
Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
Pros of netdata
- Comprehensive system monitoring with a wide range of metrics
- Real-time, interactive web dashboard for easy visualization
- Extensive plugin system for custom data collection
Cons of netdata
- Higher resource usage due to its comprehensive nature
- Steeper learning curve for configuration and customization
- May be overkill for simple process monitoring needs
Code Comparison
netdata:
static void rrdset_done(RRDSET *st) {
if(unlikely(!st->rrd_memory_mode))
return;
RRDDIM *rd;
process-exporter:
func (p *Proc) GetProcInfo() (ProcInfo, error) {
stat, err := p.Stat()
if err != nil {
return ProcInfo{}, err
}
Key Differences
- netdata is a full-featured monitoring solution, while process-exporter focuses specifically on process metrics
- netdata provides a built-in web interface, whereas process-exporter exports metrics for consumption by other tools
- netdata is written primarily in C, while process-exporter is written in Go
- netdata offers a broader range of metrics and plugins, but process-exporter is more lightweight and focused
Use Cases
- Choose netdata for comprehensive system monitoring with a user-friendly interface
- Opt for process-exporter when you need lightweight, Prometheus-compatible process metrics collection
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
Pros of Glances
- More comprehensive system monitoring, including CPU, memory, disk, network, and processes
- Cross-platform support (Linux, macOS, Windows)
- Web-based interface and REST API for easy integration
Cons of Glances
- Higher resource usage due to its comprehensive monitoring capabilities
- May be overkill for users only interested in process-specific metrics
- Steeper learning curve due to its extensive feature set
Code Comparison
Glances (Python):
from glances_api import Glances
glances = Glances()
cpu_percent = glances.cpu.percent
memory_percent = glances.mem.percent
Process-exporter (Go):
import "github.com/ncabatoff/process-exporter/proc"
processes, err := proc.AllProcs()
for _, p := range processes {
// Process metrics available here
}
Summary
Glances is a more feature-rich system monitoring tool with a broader scope, while Process-exporter focuses specifically on exporting process metrics for Prometheus. Glances offers a user-friendly interface and cross-platform support but may consume more resources. Process-exporter is lightweight and tailored for Prometheus integration but has a narrower focus on process metrics. The choice between the two depends on the specific monitoring requirements and infrastructure setup of the user.
Developer-first error tracking and performance monitoring
Pros of Sentry
- Comprehensive error tracking and monitoring solution for multiple programming languages and platforms
- Robust features including real-time alerts, release tracking, and performance monitoring
- Large and active community with extensive documentation and integrations
Cons of Sentry
- More complex setup and configuration compared to process-exporter
- Higher resource usage due to its extensive feature set
- Potential for information overload with numerous alerts and notifications
Code Comparison
process-exporter (Go):
func (p *Proc) GetProcInfo() (ProcInfo, error) {
stat, err := p.Stat()
if err != nil {
return ProcInfo{}, err
}
return ProcInfo{
PID: p.PID,
Name: stat.Comm,
}, nil
}
Sentry (Python):
def capture_exception(self, exc_info=None, **kwargs):
if exc_info is None:
exc_info = sys.exc_info()
return self.captureException(
exc_info=exc_info,
**kwargs
)
While process-exporter focuses on exporting process metrics for Prometheus, Sentry provides a more comprehensive error tracking and monitoring solution. The code snippets highlight their different purposes, with process-exporter handling process information and Sentry capturing exceptions for analysis.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
process-exporter
Prometheus exporter that mines /proc to report on selected processes.
Some apps are impractical to instrument directly, either because you don't control the code or they're written in a language that isn't easy to instrument with Prometheus. We must instead resort to mining /proc.
Installation
Either grab a package for your OS from the Releases page, or install via docker.
Running
Usage:
process-exporter [options] -config.path filename.yml
or via docker:
docker run -d --rm -p 9256:9256 --privileged -v /proc:/host/proc -v `pwd`:/config ncabatoff/process-exporter --procfs /host/proc -config.path /config/filename.yml
Important options (run process-exporter --help for full list):
-children (default:true) makes it so that any process that otherwise isn't part of its own group becomes part of the first group found (if any) when walking the process tree upwards. In other words, resource usage of subprocesses is added to their parent's usage unless the subprocess identifies as a different group name.
-threads (default:true) means that metrics will be broken down by thread name as well as group name.
-recheck (default:false) means that on each scrape the process names are re-evaluated. This is disabled by default as an optimization, but since processes can choose to change their names, this may result in a process falling into the wrong group if we happen to see it for the first time before it's assumed its proper name. You can use -recheck-with-time-limit to enable this feature only for a specific duration after process starts.
-procnames is intended as a quick alternative to using a config file. Details in the following section.
To disable any of these options, use the -option=false
.
Configuration and group naming
To select and group the processes to monitor, either provide command-line arguments or use a YAML configuration file.
The recommended option is to use a config file via -config.path, but for convenience and backwards compatibility the -procnames/-namemapping options exist as an alternative.
Using a config file
The general format of the -config.path YAML file is a top-level
process_names
section, containing a list of name matchers:
process_names:
- matcher1
- matcher2
...
- matcherN
The default config shipped with the deb/rpm packages is:
process_names:
- name: "{{.Comm}}"
cmdline:
- '.+'
A process may only belong to one group: even if multiple items would match, the first one listed in the file wins.
(Side note: to avoid confusion with the cmdline YAML element, we'll refer to
the command-line arguments of a process /proc/<pid>/cmdline
as the array
argv[]
.)
Using a config file: group name
Each item in process_names
gives a recipe for identifying and naming
processes. The optional name
tag defines a template to use to name
matching processes; if not specified, name
defaults to {{.ExeBase}}
.
Template variables available:
{{.Comm}}
contains the basename of the original executable, i.e. 2nd field in/proc/<pid>/stat
{{.ExeBase}}
contains the basename of the executable{{.ExeFull}}
contains the fully qualified path of the executable{{.Username}}
contains the username of the effective user{{.Matches}}
map contains all the matches resulting from applying cmdline regexps{{.PID}}
contains the PID of the process. Note that using PID means the group will only contain a single process.{{.StartTime}}
contains the start time of the process. This can be useful in conjunction with PID because PIDs get reused over time.{{.Cgroups}}
contains (if supported) the cgroups of the process (/proc/self/cgroup
). This is particularly useful for identifying to which container a process belongs.
Using PID
or StartTime
is discouraged: this is almost never what you want,
and is likely to result in high cardinality metrics which Prometheus will have
trouble with.
Using a config file: process selectors
Each item in process_names
must contain one or more selectors (comm
, exe
or cmdline
); if more than one selector is present, they must all match. Each
selector is a list of strings to match against a process's comm
, argv[0]
,
or in the case of cmdline
, a regexp to apply to the command line. The cmdline
regexp uses the Go syntax.
For comm
and exe
, the list of strings is an OR, meaning any process
matching any of the strings will be added to the item's group.
For cmdline
, the list of regexes is an AND, meaning they all must match. Any
capturing groups in a regexp must use the ?P<name>
option to assign a name to
the capture, which is used to populate .Matches
.
Performance tip: give an exe or comm clause in addition to any cmdline clause, so you avoid executing the regexp when the executable name doesn't match.
process_names:
# comm is the second field of /proc/<pid>/stat minus parens.
# It is the base executable name, truncated at 15 chars.
# It cannot be modified by the program, unlike exe.
- comm:
- bash
# exe is argv[0]. If no slashes, only basename of argv[0] need match.
# If exe contains slashes, argv[0] must match exactly.
- exe:
- postgres
- /usr/local/bin/prometheus
# cmdline is a list of regexps applied to argv.
# Each must match, and any captures are added to the .Matches map.
- name: "{{.ExeFull}}:{{.Matches.Cfgfile}}"
exe:
- /usr/local/bin/process-exporter
cmdline:
- -config.path\s+(?P<Cfgfile>\S+)
Here's the config I use on my home machine:
process_names:
- comm:
- chromium-browse
- bash
- prometheus
- gvim
- exe:
- /sbin/upstart
cmdline:
- --user
name: upstart:-user
Using -procnames/-namemapping instead of config.path
Every name in the procnames list becomes a process group. The default name of
a process is the value found in the second field of /proc/
If -namemapping isn't provided, every process with a comm value present in -procnames is assigned to a group based on that name, and any other processes are ignored.
The -namemapping option is a comma-separated list of alternating name,regexp values. It allows assigning a name to a process based on a combination of the process name and command line. For example, using
-namemapping "python2,([^/]+).py,java,-jar\s+([^/]+).jar"
will make it so that each different python2 and java -jar invocation will be tracked with distinct metrics. Processes whose remapped name is absent from the procnames list will be ignored. On a Ubuntu Xenian machine being used as a workstation, here's a good way of tracking resource usage for a few different key user apps:
process-exporter -namemapping "upstart,(--user)"
-procnames chromium-browse,bash,gvim,prometheus,process-exporter,upstart:-user
Since upstart --user is the parent process of the X11 session, this will make all apps started by the user fall into the group named "upstart:-user", unless they're one of the others named explicitly with -procnames, like gvim.
Group Metrics
There's no meaningful way to name a process that will only ever name a single process, so process-exporter assumes that every metric will be attached to a group of processes - not a process group in the technical sense, just one or more processes that meet a configuration's specification of what should be monitored and how to name it.
All these metrics start with namedprocess_namegroup_
and have at minimum
the label groupname
.
num_procs gauge
Number of processes in this group.
cpu_seconds_total counter
CPU usage based on /proc/[pid]/stat fields utime(14) and stime(15) i.e. user and system time. This is similar to the node_exporter's node_cpu_seconds_total
.
read_bytes_total counter
Bytes read based on /proc/[pid]/io field read_bytes. The man page says
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. This is accurate for block-backed filesystems.
but I would take it with a grain of salt.
As /proc/[pid]/io
are set by the kernel as read only to the process' user (see #137), to get these values you should run process-exporter
either as that user or as root
. Otherwise, we can't read these values and you'll get a constant 0 in the metric.
write_bytes_total counter
Bytes written based on /proc/[pid]/io field write_bytes. As with read_bytes, somewhat dubious. May be useful for isolating which processes are doing the most I/O, but probably not measuring just how much I/O is happening.
major_page_faults_total counter
Number of major page faults based on /proc/[pid]/stat field majflt(12).
minor_page_faults_total counter
Number of minor page faults based on /proc/[pid]/stat field minflt(10).
context_switches_total counter
Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches
and nonvoluntary_ctxt_switches. The extra label ctxswitchtype
can have two values:
voluntary
and nonvoluntary
.
memory_bytes gauge
Number of bytes of memory used. The extra label memtype
can have three values:
resident: Field rss(24) from /proc/[pid]/stat, whose doc says:
This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.
virtual: Field vsize(23) from /proc/[pid]/stat, virtual memory size.
swapped: Field VmSwap from /proc/[pid]/status, translated from KB to bytes.
If gathering smaps file is enabled, two additional values for memtype
are added:
proportionalResident: Sum of "Pss" fields from /proc/[pid]/smaps, whose doc says:
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it.
proportionalSwapped: Sum of "SwapPss" fields from /proc/[pid]/smaps
open_filedesc gauge
Number of file descriptors, based on counting how many entries are in the directory /proc/[pid]/fd.
worst_fd_ratio gauge
Worst ratio of open filedescs to filedesc limit, amongst all the procs in the group. The limit is the fd soft limit based on /proc/[pid]/limits.
Normally Prometheus metrics ought to be as "basic" as possible (i.e. the raw values rather than a derived ratio), but we use a ratio here because nothing else makes sense. Suppose there are 10 procs in a given group, each with a soft limit of 4096, and one of them has 4000 open fds and the others all have 40, their total fdcount is 4360 and total soft limit is 40960, so the ratio is 1:10, but in fact one of the procs is about to run out of fds. With worst_fd_ratio we're able to know this: in the above example it would be 0.97, rather than the 0.10 you'd see if you computed sum(open_filedesc) / sum(limit_filedesc).
oldest_start_time_seconds gauge
Epoch time (seconds since 1970/1/1) at which the oldest process in the group started. This is derived from field starttime(22) from /proc/[pid]/stat, added to boot time to make it relative to epoch.
num_threads gauge
Sum of number of threads of all process in the group. Based on field num_threads(20) from /proc/[pid]/stat.
states gauge
Number of threads in the group in each of various states, based on the field state(3) from /proc/[pid]/stat.
The extra label state
can have these values: Running
, Sleeping
, Waiting
, Zombie
, Other
.
Group Thread Metrics
Since publishing thread metrics adds a lot of overhead, use the -threads
command-line argument to disable them,
if necessary.
All these metrics start with namedprocess_namegroup_
and have at minimum
the labels groupname
and threadname
. threadname
is field comm(2) from
/proc/[pid]/stat. Just as groupname breaks the set of processes down into
groups, threadname breaks a given process group down into subgroups.
thread_count gauge
Number of threads in this thread subgroup.
thread_cpu_seconds_total counter
Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down
per-thread subgroup. Unlike cpu_user_seconds_total/cpu_system_seconds_total,
the label cpumode
is used to distinguish between user
and system
time.
thread_io_bytes_total counter
Same as read_bytes_total and write_bytes_total, but broken down
per-thread subgroup. Unlike read_bytes_total/write_bytes_total,
the label iomode
is used to distinguish between read
and write
bytes.
thread_major_page_faults_total counter
Same as major_page_faults_total, but broken down per-thread subgroup.
thread_minor_page_faults_total counter
Same as minor_page_faults_total, but broken down per-thread subgroup.
thread_context_switches_total counter
Same as context_switches_total, but broken down per-thread subgroup.
Instrumentation cost
process-exporter will consume CPU in proportion to the number of processes in the system and the rate at which new ones are created. The most expensive parts - applying regexps and executing templates - are only applied once per process seen, unless the command-line option -recheck is provided.
If you have mostly long-running processes process-exporter overhead should be minimal: each time a scrape occurs, it will parse of /proc/$pid/stat and /proc/$pid/cmdline for every process being monitored and add a few numbers.
Dashboards
An example Grafana dashboard to view the metrics is available at https://grafana.net/dashboards/249
Building
Requires Go 1.21 (at least) installed.
make
Exposing metrics through HTTPS
web-config.yml
# Minimal TLS configuration example. Additionally, a certificate and a key file
# are needed.
tls_server_config:
cert_file: server.crt
key_file: server.key
Running
$ ./process-exporter -web.config.file web-config.yml &
$ curl -sk https://localhost:9256/metrics | grep process
# HELP namedprocess_scrape_errors general scrape errors: no proc metrics collected during a cycle
# TYPE namedprocess_scrape_errors counter
namedprocess_scrape_errors 0
# HELP namedprocess_scrape_partial_errors incremented each time a tracked proc's metrics collection fails partially, e.g. unreadable I/O stats
# TYPE namedprocess_scrape_partial_errors counter
namedprocess_scrape_partial_errors 0
# HELP namedprocess_scrape_procread_errors incremented each time a proc's metrics collection fails
# TYPE namedprocess_scrape_procread_errors counter
namedprocess_scrape_procread_errors 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.21
# HELP process_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which process_exporter was built.
# TYPE process_exporter_build_info gauge
process_exporter_build_info{branch="",goversion="go1.17.3",revision="",version=""} 1
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
For further information about TLS configuration, please visit: exporter-toolkit
Top Related Projects
Exporter for machine metrics
Analyzes resource usage and performance characteristics of running containers.
Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.
Developer-first error tracking and performance monitoring
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot