ebpf_exporter

Prometheus exporter for custom eBPF metrics

2,355

253

2,355

View on GitHub

Top Related Projects

bcc

21,635

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

cilium

22,159

eBPF-based Networking, Security, and Observability

tracee

3,989

Linux Runtime Security and Forensics using eBPF

bpftrace

9,477

High-level tracing language for Linux

Quick Overview

The cloudflare/ebpf_exporter is an eBPF-based exporter for Prometheus metrics. It allows users to efficiently collect and expose low-level system and application metrics using eBPF technology, providing deep insights into system performance and behavior.

Pros

Lightweight and efficient metric collection using eBPF technology
Highly customizable with support for various eBPF programs and metrics
Integration with Prometheus ecosystem for easy monitoring and alerting
Ability to gather detailed system and application metrics with minimal overhead

Cons

Requires kernel support for eBPF, limiting compatibility with older systems
Steep learning curve for users unfamiliar with eBPF programming
Limited documentation for advanced use cases and custom program development
Potential security implications if not properly configured and managed

Code Examples

Basic configuration example:

programs:
  - name: counters
    metrics:
      counters:
        - name: http_requests_total
          help: Total number of HTTP requests
          labels:
            - name: method
              size: 8
            - name: path
              size: 128

This example configures a basic eBPF program to count HTTP requests with method and path labels.

Custom eBPF program example:

#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>

int kprobe__tcp_connect(struct pt_regs *ctx, struct sock *sk)
{
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    bpf_trace_printk("TCP connect: PID %d\n", pid);
    return 0;
}

This example shows a custom eBPF program that traces TCP connect events and prints the process ID.

Metric declaration example:

metrics:
  histograms:
    - name: http_request_duration_seconds
      help: HTTP request duration in seconds
      bucket_type: exp2
      bucket_min: 0.0001
      bucket_max: 1
      bucket_multiplier: 0.0001

This example declares a histogram metric for measuring HTTP request durations with exponential buckets.

Getting Started

Install dependencies:

go get -u github.com/cloudflare/ebpf_exporter

Create a configuration file (e.g., config.yaml) with your desired eBPF programs and metrics.

Run the exporter:

ebpf_exporter --config.file config.yaml

Access metrics at http://localhost:9435/metrics (default port).

Competitor Comparisons

bcc

21,635

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more

Pros of bcc

More comprehensive toolkit with a wide range of tools and utilities for eBPF development
Supports multiple programming languages (Python, Lua, C++) for writing eBPF programs
Extensive documentation and examples for various use cases

Cons of bcc

Steeper learning curve due to its broader scope and complexity
Requires more system resources and dependencies
May be overkill for simple eBPF monitoring tasks

Code Comparison

bcc example (Python):

from bcc import BPF

program = """
int hello(void *ctx) {
    bpf_trace_printk("Hello, World!\\n");
    return 0;
}
"""

b = BPF(text=program)
b.attach_kprobe(event="sys_clone", fn_name="hello")
b.trace_print()

ebpf_exporter example (YAML config):

programs:
  - name: hello
    metrics:
      - name: hello_count
        help: Number of times "Hello, World!" was printed
        type: counter
    code: |
      BPF_HASH(counts, u32);
      int hello(void *ctx) {
        u32 key = 0;
        counts.increment(key);
        return 0;
      }
    attach: kprobe:sys_clone

The bcc example demonstrates its flexibility in writing eBPF programs directly in Python, while ebpf_exporter uses a YAML configuration to define metrics and eBPF code, making it more focused on exporting metrics for monitoring systems.

cilium

22,159

eBPF-based Networking, Security, and Observability

Pros of Cilium

Comprehensive network security and visibility solution for Kubernetes and container environments
Implements advanced networking features like load balancing, network policies, and service mesh
Actively maintained with frequent updates and a large community

Cons of Cilium

More complex setup and configuration compared to simpler eBPF exporters
Higher resource requirements due to its extensive feature set
Steeper learning curve for users new to eBPF and container networking

Code Comparison

ebpf_exporter:

func (e *Exporter) Export(w http.ResponseWriter, r *http.Request) {
    e.mu.Lock()
    defer e.mu.Unlock()

    promhttp.HandlerFor(e.registry, promhttp.HandlerOpts{}).ServeHTTP(w, r)
}

Cilium:

func (d *Daemon) compileBase() error {
    opts := make([]string, 0, 20)
    if d.conf.DevicePreFilter != "undefined" {
        opts = append(opts, "-DENABLE_PREFILTER")
    }
    if d.conf.EnableIPv4 {
        opts = append(opts, "-DENABLE_IPV4")
    }
    // ... (additional options)
}

The code snippets highlight the difference in complexity between the two projects. ebpf_exporter focuses on exporting eBPF metrics, while Cilium's code demonstrates its more comprehensive approach to network management and security.

falco

8,149

Cloud Native Runtime Security

Pros of Falco

More comprehensive security monitoring and threat detection capabilities
Broader scope, covering system calls, container activities, and network events
Extensive rule set and customizable policies for various security scenarios

Cons of Falco

Higher resource consumption due to its broader monitoring scope
Steeper learning curve for configuration and rule customization
May require more fine-tuning to reduce false positives in complex environments

Code Comparison

Falco rule example:

- rule: Unauthorized Process
  desc: Detect unauthorized process execution
  condition: spawned_process and not proc.name in (allowed_processes)
  output: "Unauthorized process started (user=%user.name command=%proc.cmdline)"
  priority: WARNING

ebpf_exporter configuration example:

programs:
  - name: syscalls
    metrics:
      - name: syscalls_total
        help: Number of syscalls
        type: counter
        labels:
          - name: syscall
            size: 8
            decoders:
              - name: uint

While both projects utilize eBPF, Falco offers a more security-focused approach with predefined rules, whereas ebpf_exporter provides a flexible framework for custom eBPF-based metrics collection. Falco's rules are more expressive and tailored for security monitoring, while ebpf_exporter's configuration is geared towards performance metrics and observability.

tracee

3,989

Linux Runtime Security and Forensics using eBPF

Pros of Tracee

More comprehensive security-focused tracing and monitoring capabilities
Broader system-wide visibility, including runtime security events
Active development with frequent updates and community support

Cons of Tracee

Higher resource overhead due to extensive tracing capabilities
Steeper learning curve for configuration and customization
May require more fine-tuning for specific use cases

Code Comparison

Tracee (Go):

func main() {
    cfg := tracee.Config{
        Filters: []string{"container.id!="},
        Output:  "json",
    }
    err := tracee.Run(cfg)
    if err != nil {
        log.Fatal(err)
    }
}

ebpf_exporter (Go):

func main() {
    c, err := config.Load("config.yaml")
    if err != nil {
        log.Fatalf("Error loading config: %v", err)
    }
    e := exporter.New(c)
    http.Handle("/metrics", promhttp.Handler())
    log.Fatal(http.ListenAndServe(":9435", nil))
}

Both projects utilize eBPF technology, but Tracee focuses on comprehensive security tracing, while ebpf_exporter is primarily designed for exporting eBPF metrics to Prometheus. Tracee offers more extensive system monitoring capabilities, while ebpf_exporter provides a simpler setup for metric collection and export.

bpftrace

9,477

High-level tracing language for Linux

Pros of bpftrace

More flexible and powerful for ad-hoc tracing and debugging
Supports a wider range of tracing scenarios and kernel events
Has a simpler, more expressive language for writing tracing scripts

Cons of bpftrace

Steeper learning curve for users unfamiliar with BPF concepts
Less focused on metrics collection and export for monitoring systems
May require more system resources for complex tracing scenarios

Code Comparison

bpftrace example:

#!/usr/bin/env bpftrace
BEGIN {
    printf("Tracing TCP connects... Hit Ctrl-C to end.\n");
}
kprobe:tcp_connect {
    printf("TCP connect: %s\n", comm);
}

ebpf_exporter example:

programs:
  - name: tcp_connects
    metrics:
      - name: tcp_connects_total
        help: TCP connect calls
        type: counter
        labels:
          - name: pid
            size: 4
            decoders:
              - name: uint

The bpftrace example shows a simple script to trace TCP connects, while the ebpf_exporter example demonstrates how to configure a metric for counting TCP connects. bpftrace offers more flexibility in tracing and output, while ebpf_exporter is more focused on exporting metrics in a Prometheus-compatible format.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ebpf_exporter

Prometheus exporter for custom eBPF metrics and OpenTelemetry traces.

Metrics:

metrics

Traces:

tracing

Motivation of this exporter is to allow you to write eBPF code and export metrics that are not otherwise accessible from the Linux kernel.

ebpf.io describes eBPF:

eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.

An easy way of thinking about this exporter is bcc tools as prometheus metrics:

https://iovisor.github.io/bcc

We use libbpf rather than legacy bcc driven code, so it's more like libbpf-tools:

https://github.com/iovisor/bcc/tree/master/libbpf-tools

Producing OpenTelemetry compatible traces is also supported, see Tracing docs for more information on that.

Reading material

Building and running

Actual building

To build a binary, clone the repo and run:

make build

The default build target makes a static binary, but you could also use the build-dynamic target if you'd like a dynamically linked binary. In either case libbpf is built from source, but you could override this behavior with BUILD_LIBBPF=0, if you want to use your system libbpf.

If you're having trouble building on the host, you can try building in Docker:

docker build --tag ebpf_exporter --target ebpf_exporter .
docker cp $(docker create ebpf_exporter):/ebpf_exporter ./

To build examples (see building examples section):

make -C examples clean build

To run with biolatency config:

sudo ./ebpf_exporter --config.dir=examples --config.names=biolatency

If you pass --debug, you can see raw maps at /maps endpoint and see debug output from libbpf itself.

Docker image

A docker image can be built from this repo. A prebuilt image with examples included is also available for download from GitHub Container Registry:

https://github.com/cloudflare/ebpf_exporter/pkgs/container/ebpf_exporter

To build the image with just the exporter binary, run the following:

docker build --tag ebpf_exporter --target ebpf_exporter .

To run it with the examples, you need to build them first (see above). Then you can run by running a privileged container and bind-mounting:

$(pwd)/examples:/examples:ro to allow access to examples on the host
/sys/fs/cgroup:/sys/fs/cgroup:ro to allow resolving cgroups

You might have to bind-mount additional directories depending on your needs. You might also not need to bind-mount anything for simple kprobe examples.

The actual command to run the docker container (from the repo directory):

docker run --rm -it --privileged -p 9435:9435 \
  -v $(pwd)/examples:/examples \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
  ebpf_exporter --config.dir=examples --config.names=timers

For production use you would either bind-mount your own config and compiled bpf programs corresponding to it, or build your own image based on ours with your own config baked in.

For development use when you don't want or have any dev tools on the host, you can build the docker image with examples bundled:

docker build --tag ebpf_exporter --target ebpf_exporter_with_examples .

Some examples then can run without any bind mounts:

docker run --rm -it --privileged -p 9435:9435 \
  ebpf_exporter --config.dir=examples --config.names=timers

Or with the publicly available prebuilt image:

docker run --rm -it --privileged -p 9435:9435 \
  ghcr.io/cloudflare/ebpf_exporter --config.dir=examples --config.names=timers

Kubernetes Helm chart

A third party helm chart is available here:

https://github.com/kubeservice-stack/kubservice-charts/tree/master/charts/kubeservice-ebpf-exporter

Please note that the helm chart is not provided or supported by Cloudflare, so do your own due diligence and use it at your own risk.

Benchmarking overhead

See benchmark directory to get an idea of how low ebpf overhead is.

Required capabilities

While you can run ebpf_exporter as root, it is not strictly necessary. Only the following two capabilities are necessary for normal operation:

CAP_BPF: required for privileged bpf operations and for reading memory
CAP_PERFMON: required to attach bpf programs to kprobes and tracepoints

If you are using systemd, you can use the following configuration to run as on otherwise unprivileged dynamic user with the needed capabilities:

DynamicUser=true
AmbientCapabilities=CAP_BPF CAP_PERFMON
CapabilityBoundingSet=CAP_BPF CAP_PERFMON

Prior to Linux v5.8 there was no dedicated CAP_BPF and CAP_PERFMON, but you can use CAP_SYS_ADMIN instead of your kernel is older.

If you pass --capabilities.keep=none flag to ebpf_expoter, then it drops all capabilities after attaching the probes, leaving it fully unprivileged.

The following additional capabilities might be needed:

CAP_SYSLOG: if you use ksym decoder to have access to /proc/kallsyms. Note that you must keep this capability: --capabilities.keep=cap_syslog. See: https://elixir.bootlin.com/linux/v6.4/source/kernel/kallsyms.c#L982
CAP_IPC_LOCK: if you use perf_event_array for reading from the kernel. Note that you must keep it: --capabilities.keep=cap_perfmon,cap_ipc_lock.
CAP_SYS_ADMIN: if you want BTF information from modules. See: https://github.com/libbpf/libbpf/blob/v1.2.0/src/libbpf.c#L8654-L8666 and https://elixir.bootlin.com/linux/v6.5-rc1/source/kernel/bpf/syscall.c#L3789
CAP_NET_ADMIN: if you use net admin related programs like xdp. See: https://elixir.bootlin.com/linux/v6.4/source/kernel/bpf/syscall.c#L3787
CAP_SYS_RESOURCE: if you run an older kernel without memcg accounting for bpf memory. Upstream Linux kernel added support for this in v5.11. See: https://github.com/libbpf/libbpf/blob/v1.2.0/src/bpf.c#L98-L106
CAP_DAC_READ_SEARCH: if you want to use fanotify to monitor cgroup changes, which is the preferred way, but only available since Linux v6.6. See: https://github.com/torvalds/linux/commit/0ce7c12e88cf

External BTF Support

Execution of eBPF programs requires kernel data types normally available in /sys/kernel/btf/vmlinux, which is created during kernel build process. However, on some older kernel configurations, this file might not be available. If that's the case, an external BTF file can be supplied with --btf.path. An archive of BTFs for all some older distros and kernel versions can be found here.

Supported scenarios

Currently the only supported way of getting data out of the kernel is via maps.

See examples section for real world examples.

If you have examples you want to share, please feel free to open a PR.

Configuration

Skip to format to see the full specification.

Examples

You can find additional examples in examples directory.

Unless otherwise specified, all examples are expected to work on Linux 5.15, which is the latest LTS release at the time of writing. Thanks to CO-RE, examples are also supposed to work on any modern kernel with BTF enabled.

You can find the list of supported distros in libbpf README:

https://github.com/libbpf/libbpf#bpf-co-re-compile-once--run-everywhere

Building examples

To build examples, run:

make -C examples clean build

This will use clang to build examples with vmlinux.h we provide in this repo (see include for more on vmlinux.h).

Examples need to be compiled before they can be used.

Note that compiled examples can be used as is on any BTF enabled kernel with no runtime dependencies. Most modern Linux distributions have it enabled.

Timers via tracepoints (counters)

This config attaches to kernel tracepoints for timers subsystem and counts timers that fire with breakdown by timer name.

Resulting metrics:

# HELP ebpf_exporter_timer_starts_total Timers fired in the kernel
# TYPE ebpf_exporter_timer_starts_total counter
ebpf_exporter_timer_starts_total{function="blk_stat_timer_fn"} 10
ebpf_exporter_timer_starts_total{function="commit_timeout	[jbd2]"} 1
ebpf_exporter_timer_starts_total{function="delayed_work_timer_fn"} 25
ebpf_exporter_timer_starts_total{function="dev_watchdog"} 1
ebpf_exporter_timer_starts_total{function="mix_interrupt_randomness"} 3
ebpf_exporter_timer_starts_total{function="neigh_timer_handler"} 1
ebpf_exporter_timer_starts_total{function="process_timeout"} 49
ebpf_exporter_timer_starts_total{function="reqsk_timer_handler"} 2
ebpf_exporter_timer_starts_total{function="tcp_delack_timer"} 5
ebpf_exporter_timer_starts_total{function="tcp_keepalive_timer"} 6
ebpf_exporter_timer_starts_total{function="tcp_orphan_update"} 16
ebpf_exporter_timer_starts_total{function="tcp_write_timer"} 12
ebpf_exporter_timer_starts_total{function="tw_timer_handler"} 1
ebpf_exporter_timer_starts_total{function="writeout_period"} 5

There's config file for it:

metrics:
  counters:
    - name: timer_starts_total
      help: Timers fired in the kernel
      labels:
        - name: function
          size: 8
          decoders:
            - name: ksym

And corresponding C code that compiles into an ELF file with eBPF bytecode:

#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include "maps.bpf.h"

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1024);
    __type(key, u64);
    __type(value, u64);
} timer_starts_total SEC(".maps");

SEC("tp_btf/timer_start")
int BPF_PROG(timer_start, struct timer_list *timer)
{
    u64 function = (u64) timer->function;
    increment_map(&timer_starts_total, &function, 1);
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

Block IO histograms (histograms)

This config attaches to block io subsystem and reports disk latency as a prometheus histogram, allowing you to compute percentiles.

The following tools are working with similar concepts:

This program was the initial reason for the exporter and was heavily influenced by the experimental exporter from Daniel Swarbrick:

https://github.com/dswarbrick/ebpf_exporter

Resulting metrics:

# HELP ebpf_exporter_bio_latency_seconds Block IO latency histogram
# TYPE ebpf_exporter_bio_latency_seconds histogram
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="1e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="2e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="4e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="8e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="1.6e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="3.2e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="6.4e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.000128"} 22
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.000256"} 36
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.000512"} 40
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.001024"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.002048"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.004096"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.008192"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.016384"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.032768"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.065536"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.131072"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.262144"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="0.524288"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="1.048576"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="2.097152"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="4.194304"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="8.388608"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="16.777216"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="33.554432"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="67.108864"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="134.217728"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme0n1",operation="write",le="+Inf"} 48
ebpf_exporter_bio_latency_seconds_sum{device="nvme0n1",operation="write"} 0.021772
ebpf_exporter_bio_latency_seconds_count{device="nvme0n1",operation="write"} 48
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="1e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="2e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="4e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="8e-06"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="1.6e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="3.2e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="6.4e-05"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.000128"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.000256"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.000512"} 0
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.001024"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.002048"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.004096"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.008192"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.016384"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.032768"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.065536"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.131072"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.262144"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="0.524288"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="1.048576"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="2.097152"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="4.194304"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="8.388608"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="16.777216"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="33.554432"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="67.108864"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="134.217728"} 1
ebpf_exporter_bio_latency_seconds_bucket{device="nvme1n1",operation="write",le="+Inf"} 1
ebpf_exporter_bio_latency_seconds_sum{device="nvme1n1",operation="write"} 0.0018239999999999999
ebpf_exporter_bio_latency_seconds_count{device="nvme1n1",operation="write"} 1

You can nicely plot this with Grafana:

Histogram

Configuration concepts

The following concepts exists within ebpf_exporter.

Configs

Configs describe how to extract metrics from kernel. Each config has a corresponding eBPF code that runs in kernel to produce these metrics.

Multiple configs can be loaded at the same time.

Metrics

Metrics define what values we get from eBPF program running in the kernel.

Counters

Counters from maps are direct transformations: you pull data out of kernel, transform map keys into sets of labels and export them as prometheus counters.

Histograms

Histograms from maps are a bit more complex than counters. Maps in the kernel cannot be nested, so we need to pack keys in the kernel and unpack in user space.

We get from this:

sda, read, 1ms -> 10 ops
sda, read, 2ms -> 25 ops
sda, read, 4ms -> 51 ops

To this:

sda, read -> [1ms -> 10 ops, 2ms -> 25 ops, 4ms -> 51 ops]

Prometheus histograms expect to have all buckets when we report a metric, but the kernel creates keys as events occur, which means we need to backfill the missing data.

That's why for histogram configuration we have the following keys:

bucket_type: can be either exp2, exp2zero, linear, or fixed
bucket_min: minimum bucket key (exp2, exp2zero and linear only)
bucket_max: maximum bucket key (exp2, exp2zero and linear only)
bucket_keys: maximum bucket key (fixed only)
bucket_multiplier: multiplier for bucket keys (default is 1)

`exp2` histograms

For exp2 histograms we expect kernel to provide a map with linear keys that are log2 of actual values. We then go from bucket_min to bucket_max in user space and remap keys by exponentiating them:

count = 0
for i = bucket_min; i < bucket_max; i++ {
  count += map.get(i, 0)
  result[exp2(i) * bucket_multiplier] = count
}

Here map is the map from the kernel and result is what goes to prometheus.

Use increment_exp2_histogram in ebpf to observe values.

`exp2zero` histograms

These are the same as exp2 histograms, except:

The first key is for the value 0
All other keys are 1 larger than they should be

Use increment_exp2zero_histogram in ebpf to observe values.

`linear` histograms

For linear histograms we expect kernel to provide a map with linear keys that are results of integer division of original value by bucket_multiplier. To reconstruct the histogram in user space we do the following:

count = 0
for i = bucket_min; i < bucket_max; i++ {
  count += map.get(i, 0)
  result[i * bucket_multiplier] = count
}

`fixed` histograms

For fixed histograms we expect kernel to provide a map with fixed keys defined by the user.

count = 0
for i = 0; i < len(bucket_keys); i++ {
  count  += map.get(bucket_keys[i], 0)
  result[bucket_keys[i] * multiplier] = count
}

`sum` keys

For exp2 and linear histograms, if bucket_max + 1 contains a non-zero value, it will be used as the sum key in histogram, providing additional information and allowing richer metrics.

For fixed histograms, if buckets_keys[len(bucket_keys) - 1 ] + 1 contains a non-zero value, it will be used as the sum key.

Labels

Labels transform kernel map keys into prometheus labels.

Maps coming from the kernel are binary encoded. Values are always u64, but keys can be either primitive types like u64 or complex structs.

Each label can be transformed with decoders (see below) according to metric configuration. Generally the number of labels matches the number of elements in the kernel map key.

For map keys that are represented as structs alignment rules apply:

u64 must be aligned at 8 byte boundary
u32 must be aligned at 4 byte boundary
u16 must be aligned at 2 byte boundary

This means that the following struct:

struct disk_latency_key_t {
    u32 dev;
    u8 op;
    u64 slot;
};

Is represented as:

4 byte dev integer
1 byte op integer
3 byte padding to align slot
8 byte slot integer

When decoding, either specify the padding explicitly with the key padding or include it in the label size:

4 for dev
4 for op (1 byte value + 3 byte padding)
8 byte slot

Decoders

Decoders take a byte slice input of requested length and transform it into a byte slice representing a string. That byte slice can either be consumed by another decoder (for example string -> regexp) or or used as the final label value exporter to Prometheus.

Below are decoders we have built in.

`cgroup`

With cgroup decoder you can turn the u64 from bpf_get_current_cgroup_id into a human readable string representing cgroup path, like:

/sys/fs/cgroup/system.slice/ssh.service

ifname

Ifname decoder takes a network interface index and converts it into its name like eth0.

`dname`

Dname decoder read DNS qname from string in wire format, then decode it into '.' notation format. Could be used after string decoder. E.g.: \x07example\03com\x00 will become example.com. This decoder could be used after string decode, like the following example:

- name: qname
  decoders:
    - name: string
    - name: dname

`errno`

Errno decoder converts errno number into a string representation like EPIPE. It is normally paired with a unit decoder as the first step.

`hex`

Hex decoder turns bytes into their hex representation.

`inet_ip`

Network IP decoded can turn byte encoded IPv4 and IPv6 addresses that kernel operates on into human readable form like 1.1.1.1.

`ksym`

KSym decoder takes kernel address and converts that to the function name.

In your eBPF program you can use PT_REGS_IP_CORE(ctx) to get the address of the function you attached to as a u64 variable. Note that for kprobes you need to wrap it with KPROBE_REGS_IP_FIX() from regs-ip.bpf.h.

`majorminor`

With major-minor decoder you can turn kernel's combined u32 view of major and minor device numbers into a device name in /dev.

`pci_vendor`

With pci_vendor decoder you can transform PCI vendor IDs like 0x8086 into human readable vendor names like Intel Corporation.

`pci_device`

With pci_vendor decoder you can transform PCI vendor IDs like 0x80861000 into human readable names like 82542 Gigabit Ethernet Controller (Fiber).

Note that the you need to concatenate vendor and device id together for this.

`pci_class`

With pci_class decoder you can transform PCI class ID (the lowest byte) into the class name like Network controller.

`pci_subclass`

With pci_subclass decoder you can transform PCI subclass (two lowest bytes) into the subclass name like Ethernet controller.

`regexp`

Regexp decoder takes list of strings from regexp configuration key of the decoder and ties to use each as a pattern in golang.org/pkg/regexp:

https://golang.org/pkg/regexp

If decoder input matches any of the patterns, it is permitted. Otherwise, the whole metric label set is dropped.

An example to report metrics only for systemd-journal and syslog-ng:

- name: command
  decoders:
    - name: string
    - name: regexp
      regexps:
        - ^(kswapd).*$ # if sub-matches are present, the first one is used for the value
        - ^systemd-journal$
        - ^syslog-ng$

`static_map`

Static map decoder takes input and maps it to another value via static_map configuration key of the decoder. Values are expected as strings.

An example to match 1 to read and 2 to write:

- name: operation
  decoders:
    - name:static_map
      static_map:
        1: read
        2: write

Unknown keys will be replaced by "unknown:key_name" unless allow_unknown: true is specified in the decoder. For example, the above will decode 3 to unknown:3 and the below example will decode 3 to 3:

- name: operation
  decoders:
    - name:static_map
      allow_unknown: true
      static_map:
        1: read
        2: write

`string`

String decoder transforms possibly null terminated strings coming from the kernel into string usable for prometheus metrics.

`syscall`

Syscall decoder transforms syscall numbers into syscall names.

The tables can be regenerated by make syscalls. See scripts/mksyscalls.

`uint`

UInt decoder transforms hex encoded uint values from the kernel into regular base10 numbers. For example: 0xe -> 14.

Per CPU map support

Per CPU map reading is fully supported. If the last decoder for a percpu map is called cpu (use 2 byte uint decoder), then cpu label is added automatically. If it's not present, then the percpu counters are aggregated into one global counter.

There is percpu-softirq in examples. See #226 for examples of different modes of operation for it.

Configuration file format

Configuration file is defined like this:

# Metrics attached to the program
[ metrics: metrics ]
# Kernel symbol addresses to define as kaddr_{symbol} from /proc/kallsyms (consider CONFIG_KALLSYMS_ALL)
kaddrs:
  [ - symbol_to_resolve ]

`metrics`

See Metrics section for more details.

counters:
  [ - counter ]
histograms:
  [ - histogram ]

`counter`

See Counters section for more details.

name: <prometheus counter name>
help: <prometheus metric help>
perf_event_array: <whether map is a BPF_MAP_TYPE_PERF_EVENT_ARRAY map: bool>
flush_interval: <how often should we flush metrics from the perf_event_array: time.Duration>
labels:
  [ - label ]

An example of perf_map can be found here.

`histogram`

See Histograms section for more details.

name: <prometheus histogram name>
help: <prometheus metric help>
bucket_type: <map bucket type: exp2 or linear>
bucket_multiplier: <map bucket multiplier: float64>
bucket_min: <min bucket value: int>
bucket_max: <max bucket value: int>
labels:
  [ - label ]

`label`

See Labels section for more details.

name: <prometheus label name>
size: <field size>
padding: <padding size>
decoders:
  [ - decoder ]

`decoder`

See Decoders section for more details.

name: <decoder name>
# ... decoder specific configuration

Built-in metrics

`ebpf_exporter_enabled_configs`

This gauge reports a timeseries for every loaded config:

# HELP ebpf_exporter_enabled_configs The set of enabled configs
# TYPE ebpf_exporter_enabled_configs gauge
ebpf_exporter_enabled_configs{name="cachestat"} 1

`ebpf_exporter_ebpf_program_info`

This gauge reports information available for every ebpf program:

# HELP ebpf_exporter_ebpf_programs Info about ebpf programs
# TYPE ebpf_exporter_ebpf_programs gauge
ebpf_exporter_ebpf_program_info{config="cachestat",id="545",program="add_to_page_cache_lru",tag="6c007da3187b5b32"} 1
ebpf_exporter_ebpf_program_info{config="cachestat",id="546",program="mark_page_accessed",tag="6c007da3187b5b32"} 1
ebpf_exporter_ebpf_program_info{config="cachestat",id="547",program="folio_account_dirtied",tag="6c007da3187b5b32"} 1
ebpf_exporter_ebpf_program_info{config="cachestat",id="548",program="mark_buffer_dirty",tag="6c007da3187b5b32"} 1

Here tag can be used for tracing and performance analysis with two conditions:

net.core.bpf_jit_kallsyms=1 sysctl is set
--kallsyms=/proc/kallsyms is passed to perf record

Newer kernels allow --kallsyms to perf top as well, in the future it may not be required at all:

https://www.spinics.net/lists/linux-perf-users/msg07216.html

`ebpf_exporter_ebpf_program_attached`

This gauge reports whether individual programs were successfully attached.

# HELP ebpf_exporter_ebpf_program_attached Whether a program is attached
# TYPE ebpf_exporter_ebpf_program_attached gauge
ebpf_exporter_ebpf_program_attached{id="247"} 1
ebpf_exporter_ebpf_program_attached{id="248"} 1
ebpf_exporter_ebpf_program_attached{id="249"} 0
ebpf_exporter_ebpf_program_attached{id="250"} 1

It needs to be joined by id label with ebpf_exporter_ebpf_program_info to get more information about the program.

`ebpf_exporter_ebpf_program_run_time_seconds`

This counter reports how much time individual programs spent running.

# HELP ebpf_exporter_ebpf_program_run_time_seconds How long has the program been executing
# TYPE ebpf_exporter_ebpf_program_run_time_seconds counter
ebpf_exporter_ebpf_program_run_time_seconds{id="247"} 0
ebpf_exporter_ebpf_program_run_time_seconds{id="248"} 0.001252621
ebpf_exporter_ebpf_program_run_time_seconds{id="249"} 0
ebpf_exporter_ebpf_program_run_time_seconds{id="250"} 3.6668e-05

It requires kernel.bpf_stats_enabled sysctl to be enabled.

It needs to be joined by id label with ebpf_exporter_ebpf_program_info to get more information about the program.

`ebpf_exporter_ebpf_program_run_count_total`

This counter reports how many times individual programs ran.

# HELP ebpf_exporter_ebpf_program_run_count_total How many times has the program been executed
# TYPE ebpf_exporter_ebpf_program_run_count_total counter
ebpf_exporter_ebpf_program_run_count_total{id="247"} 0
ebpf_exporter_ebpf_program_run_count_total{id="248"} 11336
ebpf_exporter_ebpf_program_run_count_total{id="249"} 0
ebpf_exporter_ebpf_program_run_count_total{id="250"} 69

It requires kernel.bpf_stats_enabled sysctl to be enabled.

It needs to be joined by id label with ebpf_exporter_ebpf_program_info to get more information about the program.

`ebpf_exporter_decoder_errors_total`

This counter reports the number of times labels failed to be decoded by config.

# HELP ebpf_exporter_decoder_errors_total How many times has decoders encountered errors
# TYPE ebpf_exporter_decoder_errors_total counter
ebpf_exporter_decoder_errors_total{config="kstack"} 0
ebpf_exporter_decoder_errors_total{config="sock-trace"} 4

License

MIT

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of bcc

Cons of bcc

Code Comparison

Pros of Cilium

Cons of Cilium

Code Comparison

Pros of Falco

Cons of Falco

Code Comparison

Pros of Tracee

Cons of Tracee

Code Comparison

Pros of bpftrace

Cons of bpftrace

Code Comparison

Convert designs to code with AI

README

ebpf_exporter

Reading material

Building and running

Actual building

Docker image

Kubernetes Helm chart

Benchmarking overhead

Required capabilities

External BTF Support

Supported scenarios

Configuration

Examples

Building examples

Timers via tracepoints (counters)

Block IO histograms (histograms)

Configuration concepts

Configs

Metrics

Counters

Histograms

exp2 histograms

exp2zero histograms

linear histograms

fixed histograms

sum keys

Labels

Decoders

cgroup

ifname

dname

errno

hex

inet_ip

ksym

majorminor

pci_vendor

pci_device

pci_class

pci_subclass

regexp

static_map

string

syscall

uint

Per CPU map support

Configuration file format

metrics

counter

histogram

label

decoder

Built-in metrics

ebpf_exporter_enabled_configs

ebpf_exporter_ebpf_program_info

ebpf_exporter_ebpf_program_attached

ebpf_exporter_ebpf_program_run_time_seconds

`exp2` histograms

`exp2zero` histograms

`linear` histograms

`fixed` histograms

`sum` keys

`cgroup`

`dname`

`errno`

`hex`

`inet_ip`

`ksym`

`majorminor`

`pci_vendor`

`pci_device`

`pci_class`

`pci_subclass`

`regexp`

`static_map`

`string`

`syscall`

`uint`

`metrics`

`counter`

`histogram`

`label`

`decoder`

`ebpf_exporter_enabled_configs`

`ebpf_exporter_ebpf_program_info`

`ebpf_exporter_ebpf_program_attached`

`ebpf_exporter_ebpf_program_run_time_seconds`

`ebpf_exporter_ebpf_program_run_count_total`

`ebpf_exporter_decoder_errors_total`