vigil

🚦 Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).

1,823

136

1,823

View on GitHub

Top Related Projects

grafana

68,692

The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.

prometheus

59,181

The Prometheus monitoring system and time series database.

healthchecks

9,166

Open-source cron job and background task monitoring service, written in Python & Django

uptime-kuma

71,964

A fancy self-hosted monitoring tool

statping

7,211

Status Page for monitoring your websites and applications with beautiful graphs, analytics, and plugins. Run on any type of environment.

Quick Overview

Vigil is an open-source status page and monitoring system for your infrastructure. It provides real-time monitoring of your services, alerting capabilities, and a public status page to keep your users informed about the health of your systems.

Pros

Easy to set up and configure with a single TOML configuration file
Supports various probe types including HTTP, TCP, ICMP, and DNS
Integrates with popular notification services like Slack, Twilio, and email
Provides a clean and customizable public status page

Cons

Limited advanced monitoring features compared to more complex solutions
Requires self-hosting, which may not be suitable for all users
Documentation could be more comprehensive for advanced use cases
Limited community support compared to more popular monitoring solutions

Getting Started

Download the latest Vigil release for your platform from the GitHub releases page.
Create a configuration file named config.cfg in the same directory as the Vigil binary:

[server]
host = "0.0.0.0"
port = 8080

[assets]
path = "./res/assets/"

[branding]
page_title = "Status Page"
page_url = "https://status.example.com"
company_name = "My Company"
icon_color = "#1972F5"
icon_url = "https://example.com/icon.png"

[[probe]]
id = "web"
label = "Website"
description = "Our company website"
handler = "http"
url = "https://example.com"
interval = 30

Run Vigil:

./vigil -c ./config.cfg

Access the status page at http://localhost:8080 (or the configured host and port).

Competitor Comparisons

grafana

68,692

Pros of Grafana

More comprehensive and feature-rich monitoring solution
Supports a wide range of data sources and visualization types
Large community and extensive plugin ecosystem

Cons of Grafana

Steeper learning curve and more complex setup
Requires more resources to run and maintain
May be overkill for simple monitoring needs

Code Comparison

Vigil configuration (TOML):

[server]
log_level = "info"
inet = "[::]:8080"
workers = 4

Grafana configuration (INI):

[server]
http_addr = 0.0.0.0
http_port = 3000

Summary

Grafana is a powerful and versatile monitoring and visualization platform, offering a wide range of features and integrations. It's ideal for complex monitoring scenarios and organizations with diverse data sources. However, this power comes at the cost of increased complexity and resource requirements.

Vigil, on the other hand, is a simpler and more focused monitoring solution, specifically designed for uptime monitoring and status page generation. It's easier to set up and use, making it a good choice for smaller projects or teams that need straightforward uptime monitoring without the overhead of a full-fledged monitoring suite.

The choice between the two depends on the specific needs of the project, the scale of monitoring required, and the available resources for setup and maintenance.

prometheus

59,181

The Prometheus monitoring system and time series database.

Pros of Prometheus

More comprehensive monitoring solution with a powerful query language (PromQL)
Extensive ecosystem with many integrations and exporters
Highly scalable and suitable for large-scale deployments

Cons of Prometheus

Steeper learning curve and more complex setup
Requires more resources to run effectively
May be overkill for simple monitoring needs

Code Comparison

Vigil configuration (TOML):

[probe.tcp]
host = "example.com"
port = 80
interval = 30

Prometheus configuration (YAML):

scrape_configs:
  - job_name: 'example'
    static_configs:
      - targets: ['example.com:80']
    scrape_interval: 30s

Summary

Prometheus is a more powerful and feature-rich monitoring system, suitable for complex environments and large-scale deployments. It offers advanced querying capabilities and a vast ecosystem of integrations.

Vigil, on the other hand, is a simpler and more lightweight solution, focused on uptime monitoring and status page generation. It's easier to set up and use for basic monitoring needs but lacks the advanced features and scalability of Prometheus.

Choose Prometheus for comprehensive monitoring in larger, more complex environments. Opt for Vigil if you need a straightforward uptime monitoring solution with minimal setup and resource requirements.

healthchecks

9,166

Open-source cron job and background task monitoring service, written in Python & Django

Pros of Healthchecks

More comprehensive monitoring features, including ping-based checks and integration with various services
Offers a hosted solution, making it easier for users who don't want to self-host
Provides a web interface for managing checks and viewing reports

Cons of Healthchecks

More complex setup and configuration compared to Vigil's simplicity
Requires a database backend, which may increase resource usage and maintenance overhead
Less focused on minimal resource usage compared to Vigil's lightweight approach

Code Comparison

Healthchecks (Python):

@csrf_exempt
def ping(request, code):
    check = get_object_or_404(Check, code=code)
    check.n_pings = F("n_pings") + 1
    check.last_ping = timezone.now()
    check.save()
    return HttpResponse("OK")

Vigil (Rust):

pub fn check_http(target: &str, timeout: u64) -> bool {
    let client = reqwest::Client::new();
    let response = client.get(target).timeout(Duration::from_secs(timeout)).send();
    response.is_ok() && response.unwrap().status().is_success()
}

The code snippets show different approaches: Healthchecks uses a Django view for handling pings, while Vigil implements a simple HTTP check function in Rust.

uptime-kuma

71,964

A fancy self-hosted monitoring tool

Pros of Uptime Kuma

User-friendly web interface with a modern design
Supports a wide range of monitoring types (HTTP, TCP, DNS, etc.)
Easy to set up and deploy, with Docker support

Cons of Uptime Kuma

Less focus on enterprise-level features
May consume more resources for large-scale monitoring

Code Comparison

Uptime Kuma (JavaScript):

async function ping(monitor) {
  const startTime = Date.now();
  try {
    const res = await axios.get(monitor.url);
    const responseTime = Date.now() - startTime;
    return { status: 'up', responseTime };
  } catch (error) {
    return { status: 'down', error: error.message };
  }
}

Vigil (Rust):

pub fn check_http(url: &str, timeout: Duration) -> Result<Duration, Error> {
    let client = reqwest::Client::new();
    let start = Instant::now();
    let response = client.get(url).timeout(timeout).send()?;
    let elapsed = start.elapsed();
    Ok(elapsed)
}

Summary

Uptime Kuma offers a more user-friendly approach with its modern web interface and easy setup, making it suitable for small to medium-scale monitoring needs. It supports various monitoring types and is easy to deploy.

Vigil, on the other hand, is written in Rust and focuses on performance and efficiency, making it more suitable for large-scale enterprise monitoring. It may have a steeper learning curve but offers more advanced features for complex monitoring scenarios.

The code comparison shows that Uptime Kuma uses JavaScript with async/await for HTTP checks, while Vigil utilizes Rust's strong typing and error handling for potentially more efficient execution.

statping

7,211

Status Page for monitoring your websites and applications with beautiful graphs, analytics, and plugins. Run on any type of environment.

Pros of Statping

More comprehensive feature set, including detailed metrics and analytics
Supports multiple notification channels (e.g., Slack, Discord, Telegram)
Offers a user-friendly web interface for easy management

Cons of Statping

More complex setup and configuration process
Requires a database for storing data, which may increase resource usage
Potentially higher learning curve for new users

Code Comparison

Vigil configuration (TOML):

[[service]]
name = "My Website"
url = "https://example.com"
interval = 30

Statping configuration (JSON):

{
  "name": "My Website",
  "domain": "https://example.com",
  "check_interval": 30,
  "type": "http"
}

Both projects use similar configuration structures, but Vigil uses TOML while Statping uses JSON. Statping's configuration allows for more detailed options, reflecting its broader feature set.

Vigil focuses on simplicity and lightweight monitoring, making it easier to set up and manage for basic use cases. Statping offers more advanced features and customization options, but at the cost of increased complexity and resource requirements. The choice between the two depends on the specific monitoring needs and available resources of the user.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Vigil

Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).

Vigil is an open-source Status Page you can host on your infrastructure, used to monitor all your servers and apps, and visible to your users (on a domain of your choice, eg. status.example.com).

It is useful in microservices contexts to monitor both apps and backends. If a node goes down in your infrastructure, you receive a status change notification in a Slack channel, Email, Twilio SMS or/and XMPP.

Tested at Rust version: rustc 1.71.1 (eb26296b5 2023-08-03)

ððº Crafted in Budapest, Hungary.

ð See a live demo of Vigil on Crisp Status Page.

:newspaper: The Vigil project was announced in a post on my personal journal.

Who uses it?


Crisp	Meilisearch	miragespace	Redsmin	Image-Charts	Pikomit	Notice	Bareconnect

ð You use Vigil and you want to be listed there? Contact me.

Features

Monitors your infrastructure services automatically
Notifies you when a service gets down or gets back up via a configured channel:
- Email
- Twilio (SMS)
- Slack
- Zulip
- Telegram
- Pushover
- Gotify
- XMPP
- Matrix
- Cisco Webex
- Webhook
Generates a status page, that you can host on your domain for your public users (eg. https://status.example.com)
Allows publishing announcements, eg. let your users know that a planned maintenance is upcoming

How does it work?

Vigil monitors all your infrastructure services. You first need to configure target services to be monitored, and then Vigil does the rest for you.

There are three kinds of services Vigil can monitor:

HTTP / TCP / ICMP services: Vigil frequently probes an HTTP, TCP or ICMP target and checks for reachability
Application services: Install the Vigil Reporter library eg. on your NodeJS app and get reports when your app gets down, as well as when the host server system is overloaded
Local services: Install a slave Vigil Local daemon to monitor services that cannot be reached by the Vigil master server (eg. services that are on a different LAN)

It is recommended to configure Vigil, Vigil Reporter or Vigil Local to send frequent probe checks, as to ensure you are quickly notified when a service gets down (thus to reduce unexpected downtime on your services).

Hosted alternative to Vigil

Vigil needs to be hosted on your own systems, and maintained on your end. If you do not feel like managing yet another service, you may use Crisp Status instead.

Crisp Status is a direct port of Vigil to the Crisp customer support platform.

Crisp Status hosts your status page on Crisp systems, and is able to do what Vigil does (and even more!). Crisp Status is integrated to other Crisp products (eg. Crisp Chatbox & Crisp Helpdesk). It warns your users over chatbox and helpdesk if your status page reports as dead for an extended period of time.

As an example of a status page running Crisp Status, check out Enrich Status Page.

How to use it?

Installation

Vigil is built in Rust. To install it, either download a version from the Vigil releases page, use cargo install or pull the source code from master.

ð Each release binary comes with an .asc signature file, which can be verified using @valeriansaliou GPG public key: :key:valeriansaliou.gpg.pub.asc.

Install from packages:

Vigil provides pre-built packages for Debian-based systems (Debian, Ubuntu, etc.).

Important: Vigil only provides 64 bits packages targeting Debian 10, 11 & 12 for now (codenames: buster, bullseye & bookworm). You will still be able to use them on other Debian versions, as well as Ubuntu.

First, add the Vigil APT repository (eg. for Debian bookworm):

echo "deb [signed-by=/usr/share/keyrings/valeriansaliou_vigil.gpg] https://packagecloud.io/valeriansaliou/vigil/debian/ bookworm main" > /etc/apt/sources.list.d/valeriansaliou_vigil.list

curl -fsSL https://packagecloud.io/valeriansaliou/vigil/gpgkey | gpg --dearmor -o /usr/share/keyrings/valeriansaliou_vigil.gpg

apt-get update

Then, install the Vigil package:

apt-get install vigil

Then, edit the pre-filled Vigil configuration file:

nano /etc/vigil/vigil.cfg

Finally, restart Vigil:

service vigil restart

Install from Cargo:

If you prefer managing vigil via Rust's Cargo, install it directly via cargo install:

cargo install vigil-server

Ensure that your $PATH is properly configured to source the Crates binaries, and then run Vigil using the vigil command.

Install from source:

The last option is to pull the source code from Git and compile Vigil via cargo:

cargo build --release

You can find the built binaries in the ./target/release directory.

Install libssl-dev (ie. OpenSSL headers) and libstrophe-dev (ie. XMPP library headers; only if you need the XMPP notifier) before you compile Vigil. SSL dependencies are required for the HTTPS probes and email notifications.

Install from Docker Hub:

You might find it convenient to run Vigil via Docker. You can find the pre-built Vigil image on Docker Hub as valeriansaliou/vigil.

Pre-built Docker version may not be the latest version of Vigil available.

First, pull the valeriansaliou/vigil image:

docker pull valeriansaliou/vigil:v1.27.0

Then, seed it a configuration file and run it (replace /path/to/your/vigil/config.cfg with the path to your configuration file):

docker run -p 8080:8080 -v /path/to/your/vigil/config.cfg:/etc/vigil.cfg valeriansaliou/vigil:v1.27.0

In the configuration file, ensure that:

server.inet is set to 0.0.0.0:8080 (this lets Vigil be reached from outside the container)
assets.path is set to ./res/assets/ (this refers to an internal path in the container, as the assets are contained there)

Vigil will be reachable from http://localhost:8080.

Configuration

Use the sample config.cfg configuration file and adjust it to your own environment.

You can also use environment variables with string interpolation in your configuration file, eg. manager_token = ${VIGIL_MANAGER_TOKEN}.

Available configuration options are commented below, with allowed values:

[server]

log_level (type: string, allowed: debug, info, warn, error, default: error) â Verbosity of logging, set it to error in production
inet (type: string, allowed: IPv4 / IPv6 + port, default: [::1]:8080) â Host and TCP port the Vigil public status page should listen on
workers (type: integer, allowed: any number, default: 4) â Number of workers for the Vigil public status page to run on
manager_token (type: string, allowed: secret token, default: no default) â Manager secret token (ie. secret password)
reporter_token (type: string, allowed: secret token, default: no default) â Reporter secret token (ie. secret password)

[assets]

path (type: string, allowed: UNIX path, default: ./res/assets/) â Path to Vigil assets directory

[branding]

page_title (type: string, allowed: any string, default: Status Page) â Status page title
page_url (type: string, allowed: URL, no default) â Status page URL
company_name (type: string, allowed: any string, no default) â Company name (ie. your company)
icon_color (type: string, allowed: hexadecimal color code, no default) â Icon color (ie. your icon background color)
icon_url (type: string, allowed: URL, no default) â Icon URL, the icon should be your squared logo, used as status page favicon (PNG format recommended)
logo_color (type: string, allowed: hexadecimal color code, no default) â Logo color (ie. your logo primary color)
logo_url (type: string, allowed: URL, no default) â Logo URL, the logo should be your full-width logo, used as status page header logo (SVG format recommended)
website_url (type: string, allowed: URL, no default) â Website URL to be used in status page header
support_url (type: string, allowed: URL, no default) â Support URL to be used in status page header (ie. where users can contact you if something is wrong)
custom_html (type: string, allowed: HTML, default: empty) â Custom HTML to include in status page head (optional)

[metrics]

poll_interval (type: integer, allowed: seconds, default: 120) â Interval for which to probe nodes in poll mode
poll_retry (type: integer, allowed: seconds, default: 2) â Interval after which to try probe for a second time nodes in poll mode (only when the first check fails)
poll_http_status_healthy_above (type: integer, allowed: HTTP status code, default: 200) â HTTP status above which poll checks to HTTP replicas reports as healthy
poll_http_status_healthy_below (type: integer, allowed: HTTP status code, default: 400) â HTTP status under which poll checks to HTTP replicas reports as healthy
poll_delay_dead (type: integer, allowed: seconds, default: 10) â Delay after which a node in poll mode is to be considered dead (ie. check response delay)
poll_delay_sick (type: integer, allowed: seconds, default: 5) â Delay after which a node in poll mode is to be considered sick (ie. check response delay)
poll_parallelism (type: integer, allowed: any number, default: 4) â Maximum number of poll threads to be ran simultaneously (in case you are monitoring a lot of nodes and/or slow-replying nodes, increasing parallelism will help)
push_delay_dead (type: integer, allowed: seconds, default: 20) â Delay after which a node in push mode is to be considered dead (ie. time after which the node did not report)
push_system_cpu_sick_above (type: float, allowed: system CPU loads, default: 0.90) â System load indice for CPU above which to consider a node in push mode sick (ie. UNIX system load)
push_system_ram_sick_above (type: float, allowed: system RAM loads, default: 0.90) â System load indice for RAM above which to consider a node in push mode sick (ie. percent RAM used)
script_interval (type: integer, allowed: seconds, default: 300) â Interval for which to probe nodes in script mode
script_parallelism (type: integer, allowed: any number, default: 2) â Maximum number of script executor threads to be ran simultaneously (in case you are running a lot of scripts and/or long-running scripts, increasing parallelism will help)
local_delay_dead (type: integer, allowed: seconds, default: 40) â Delay after which a node in local mode is to be considered dead (ie. time after which the node did not report)

[plugins]

[plugins.rabbitmq]

api_url (type: string, allowed: URL, no default) â RabbitMQ API URL (ie. http://127.0.0.1:15672)
auth_username (type: string, allowed: username, no default) â RabbitMQ API authentication username
auth_password (type: string, allowed: password, no default) â RabbitMQ API authentication password
virtualhost (type: string, allowed: virtual host, no default) â RabbitMQ virtual host hosting the queues to be monitored
queue_ready_healthy_below (type: integer, allowed: any number, no default) â Maximum number of payloads in RabbitMQ queue with status ready to consider node healthy.
queue_nack_healthy_below (type: integer, allowed: any number, no default) â Maximum number of payloads in RabbitMQ queue with status nack to consider node healthy.
queue_ready_dead_above (type: integer, allowed: any number, no default) â Threshold on the number of payloads in RabbitMQ queue with status ready above which node should be considered dead (stalled queue)
queue_nack_dead_above (type: integer, allowed: any number, no default) â Threshold on the number of payloads in RabbitMQ queue with status nack above which node should be considered dead (stalled queue)
queue_loaded_retry_delay (type: integer, allowed: milliseconds, no default) â Re-check queue if it reports as loaded after delay; this avoids false-positives if your systems usually take a bit of time to process pending queue payloads (if any)

[notify]

startup_notification (type: boolean, allowed: true, false, default: true) â Whether to send startup notification or not (stating that systems are healthy)
reminder_interval (type: integer, allowed: seconds, no default) â Interval at which downtime reminder notifications should be sent (if any)
reminder_backoff_function (type string, allowed: none, linear, square, cubic, default: none) â If enabled, the downtime reminder interval will get larger as reminders are sent. The value will be reminder_interval Ã pow(N, x) with N being the number of reminders sent since the service went down, and x being the specified growth factor.
reminder_backoff_limit (type: integer, allowed: any number, default: 3) â Maximum value for the downtime reminder backoff counter (if a backoff function is enabled).

[notify.email]

to (type: string, allowed: email address, no default) â Email address to which to send emails
from (type: string, allowed: email address, no default) â Email address from which to send emails
smtp_host (type: string, allowed: hostname, IPv4, IPv6, default: localhost) â SMTP host to connect to
smtp_port (type: integer, allowed: TCP port, default: 587) â SMTP TCP port to connect to
smtp_username (type: string, allowed: any string, no default) â SMTP username to use for authentication (if any)
smtp_password (type: string, allowed: any string, no default) â SMTP password to use for authentication (if any)
smtp_encrypt (type: boolean, allowed: true, false, default: true) â Whether to encrypt SMTP connection with STARTTLS or not
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send emails only for downtime reminders or everytime

[notify.twilio]

to (type: array[string], allowed: phone numbers, no default) â List of phone numbers to which to send text messages
service_sid (type: string, allowed: any string, no default) â Twilio service identifier (ie. Service Sid)
account_sid (type: string, allowed: any string, no default) â Twilio account identifier (ie. Account Sid)
auth_token (type: string, allowed: any string, no default) â Twilio authentication token (ie. Auth Token)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send text messages only for downtime reminders or everytime

[notify.slack]

hook_url (type: string, allowed: URL, no default) â Slack hook URL (ie. https://hooks.slack.com/[..])
mention_channel (type: boolean, allowed: true, false, default: false) â Whether to mention channel when sending Slack messages (using @channel, which is handy to receive a high-priority notification)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send Slack messages only for downtime reminders or everytime

[notify.zulip]

bot_email (type: string, allowed: any string, no default) â The bot mail address as given by the Zulip interface
bot_api_key (type: string, allowed: any string, no default) â The bot API key as given by the Zulip interface
channel (type: string, allowed: any string, no default) â The name of the channel to send notifications to
api_url (type: string, allowed: URL, no default) â The API endpoint url (eg. https://domain.zulipchat.com/api/v1/)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send messages only for downtime reminders or everytime

[notify.telegram]

bot_token (type: string, allowed: any strings, no default) â Telegram bot token
chat_id (type: string, allowed: any strings, no default) â Chat identifier where you want Vigil to send messages. Can be group chat identifier (eg. "@foo") or user chat identifier (eg. "123456789")

[notify.pushover]

app_token (type: string, allowed: any string, no default) â Pushover application token (you need to create a dedicated Pushover application to get one)
user_keys (type: array[string], allowed: any strings, no default) â List of Pushover user keys (ie. the keys of your Pushover target users for notifications)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send Pushover notifications only for downtime reminders or everytime

[notify.gotify]

app_url (type: string, allowed: URL, no default) - Gotify endpoint without trailing slash (eg. https://push.gotify.net)
app_token (type: string, allowed: any string, no default) â Gotify application token
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send Gotify notifications only for downtime reminders or everytime

[notify.xmpp]

Notice: the XMPP notifier requires libstrophe (libstrophe-dev package on Debian) to be available when compiling Vigil, with the feature notifier-xmpp enabled upon Cargo build.

to (type: string, allowed: Jabber ID, no default) â Jabber ID (JID) to which to send messages
from (type: string, allowed: Jabber ID, no default) â Jabber ID (JID) from which to send messages
xmpp_password (type: string, allowed: any string, no default) â XMPP account password to use for authentication
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send messages only for downtime reminders or everytime

[notify.matrix]

homeserver_url (type: string, allowed: URL, no default) â Matrix server where the account has been created (eg. https://matrix.org)
access_token (type: string, allowed: any string, no default) â Matrix access token from a previously created session (eg. Element Web access token)
room_id (type: string, allowed: any string, no default) â Matrix room ID to which to send messages (eg. !abc123:matrix.org)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send messages only for downtime reminders or everytime

[notify.webex]

endpoint_url (type: string, allowed: URL, no default) â Webex endpoint URL (eg. https://webexapis.com/v1/messages)
token (type: string, allowed: any string, no default) - Webex access token
room_id (type: string, allowed: any string, no default) - Webex room ID to which to send messages (eg. Y2lzY29zcGFyazovL3VzL1JPT00vMmJmOD)
reminders_only (type: boolean, allowed: true, false, default: false) â Whether to send messages only for downtime reminders or everytime

[notify.webhook]

hook_url (type: string, allowed: URL, no default) â Web Hook URL (eg. https://domain.com/webhooks/[..])

[probe]

[[probe.service]]

id (type: string, allowed: any unique lowercase string, no default) â Unique identifier of the probed service (not visible on the status page)
label (type: string, allowed: any string, no default) â Name of the probed service (visible on the status page)

[[probe.service.node]]

id (type: string, allowed: any unique lowercase string, no default) â Unique identifier of the probed service node (not visible on the status page)
label (type: string, allowed: any string, no default) â Name of the probed service node (visible on the status page)
mode (type: string, allowed: poll, push, script, local, no default) â Probe mode for this node (ie. poll is direct HTTP, TCP or ICMP poll to the URLs set in replicas, while push is for Vigil Reporter nodes, script is used to execute a shell script and local is for Vigil Local nodes)
replicas (type: array[string], allowed: TCP, ICMP or HTTP URLs, default: empty) â Node replica URLs to be probed (only used if mode is poll)
scripts (type: array[string], allowed: shell scripts as source code, default: empty) â Shell scripts to be executed on the system as a Vigil sub-process; they are handy to build custom probes (only used if mode is script)
http_headers (type: map[string, string], allowed: any valid header name and value, default: empty) â HTTP headers to add to HTTP requests (eg. http_headers = { "Authorization" = "Bearer xxxx" })
http_method (type string, allowed: GET, HEAD, POST, PUT, PATCH, no default) â HTTP method to use when polling the endpoint (omitting this will default to using HEAD or GET depending on the http_body_healthy_match configuration value)
http_body (type string, allowed: any string, no default) â Body to send in the HTTP request when polling an endpoint (this only works if http_method is set to POST, PUT or PATCH)
http_body_healthy_match (type: string, allowed: regular expressions, no default) â HTTP response body for which to report node replica as healthy (if the body does not match, the replica will be reported as dead, even if the status code check passes; the check uses a GET rather than the usual HEAD if this option is set)
reveal_replica_name (type: boolean, allowed: true, false, default: false) â Whether to reveal replica name on public status page or not (this can be a security risk if a replica URL is to be kept secret)
link_url (type: string, allowed: URL, no default) â Link URL to show next to the node health (this can be used to direct the user to another page to see more details)
link_label (type: string, allowed: any string, no default) â Link label to use for the URL link (if any link is set)
rabbitmq_queue (type: string, allowed: RabbitMQ queue names, no default) â RabbitMQ queue associated to node, which to check against for pending payloads via RabbitMQ API (this helps monitor unacked payloads accumulating in the queue)
rabbitmq_queue_nack_healthy_below (type: integer, allowed: any number, no default) â Maximum number of payloads in RabbitMQ queue associated to node, with status nack to consider node healthy (this overrides the global plugins.rabbitmq.queue_nack_healthy_below)
rabbitmq_queue_nack_dead_above (type: integer, allowed: any number, no default) â Threshold on the number of payloads in RabbitMQ queue associated to node, with status nack above which node should be considered dead (stalled queue, this overrides the global plugins.rabbitmq.queue_nack_dead_above)

Run Vigil

Vigil can be run as such:

./vigil -c /path/to/config.cfg

Usage recommendations

Consider the following recommendations when using Vigil:

Vigil should be hosted on a safe, separate server. This server should run on a different physical machine and network than your monitored infrastructure servers.
Make sure to whitelist the Vigil server public IP (both IPv4 and IPv6) on your monitored HTTP services; this applies if you use a bot protection service that challenges bot IPs, eg. Distil Networks or Cloudflare. Vigil will see the HTTP service as down if a bot challenge is raised.

What status variants look like?

Vigil has 3 status variants, either healthy (no issue ongoing), sick (services under high load) or dead (outage):

Healthy status variant

Status Healthy

Sick status variant

Status Sick

Dead status variant

Status Dead

What do announcements look like?

Announcements can be published to let your users know about any planned maintenance, as well as your progress on resolving a downtime:

Announcement

What do alerts look like?

When a monitored backend or app goes down in your infrastructure, Vigil can let you know by Slack, Twilio SMS, Email and XMPP:

Vigil alert in Slack

You can also get nice realtime down and up alerts on your eg. iPhone and Apple Watch:

Vigil down alert on iPhone (Slack) Vigil up alert on Apple Watch (Slack) Vigil alerts on iPhone (Twilio SMS)

What do Webhook payloads look like?

If you are using the Webhook notifier in Vigil, you will receive a JSON-formatted payload with alert details upon any status change; plus reminders if notify.reminder_interval is configured.

Here is an example of a Webhook payload:

{
  "type": "changed",
  "status": "dead",
  "time": "08:58:28 UTC+0200",

  "replicas": [
    "web:core:tcp://edge-3.pool.net.crisp.chat:80"
  ],

  "page": {
    "title": "Crisp Status",
    "url": "https://status.crisp.chat/"
  }
}

Webhook notifications can be tested with eg. Webhook.site, before you integrate them to your custom endpoint.

You can use those Webhook payloads to create custom notifiers to anywhere. For instance, if you are using Microsoft Teams but not Slack, you may write a tiny PHP script that receives Webhooks from Vigil and forwards a notification to Microsoft Teams. This can be handy; while Vigil only implements convenience notifiers for some selected channels, the Webhook notifier allows you to extend beyond that.

How can I create script probes?

Vigil lets you create custom probes written as shell scripts, passed in the Vigil configuration as a list of scripts to be executed for a given node.

Those scripts can be used by advanced Vigil users when their monitoring use case requires scripting, ie. when push and poll probes are not enough.

The replica health should be returned by the script shell as return codes, where:

rc=0: healthy
rc=1: sick
rc=2 and higher: dead

As scripts are usually multi-line, script contents can be passed as a literal string, enclosed between '''.

As an example, the following script configuration always return as sick:

scripts = [
  '''
  # Do some work...
  exit 1
  '''
]

Note that scripts are executed in a system shell ran by a Vigil-owned sub-process. Make sure that Vigil runs on an UNIX user with limited privileges. Running Vigil as root would let any configured script perform root-level actions on the machine, which is not recommended.

How can I integrate Vigil Reporter in my code?

Vigil Reporter is used to actively submit health information to Vigil from your apps. Apps are best monitored via application probes, which are able to report detailed system information such as CPU and RAM load. This lets Vigil show if an application host system is under high load.

Vigil Reporter Libraries

NodeJS: node-vigil-reporter
TypeScript: ts-vigil-reporter
Python: py-vigil-reporter
Golang: go-vigil-reporter
Rust: rs-vigil-reporter
Dart: dart-vigil-reporter
C#: cs-vigil-reporter

ð Cannot find the library for your programming language? Build your own and be referenced here! (contact me)

Vigil Reporter HTTP API

In case you need to manually report node metrics to the Vigil endpoint, use the following HTTP configuration (adjust it to yours).

ð Read the Vigil Reporter HTTP API protocol specifications.

How can I administrate Vigil through Vigil Manager?

Vigil Manager can be used to perform administrative actions on a running Vigil instance. For instance, it can be used to publish public announcements.

Vigil Manager HTTP API

Vigil Manager can be interacted with over its dedicated HTTP API.

ð Read the Vigil Manager HTTP API protocol specifications.

How can I monitor services on a different LAN using Vigil Local?

Vigil Local is an (optional) slave daemon that you can use to report internal service health to your Vigil-powered status page master server. It is designed to be used behind a firewall, and to monitor hosts bound to a local loop or LAN network, that are not available to your main Vigil status page.

Vigil Local monitors local poll and script replicas, and reports their status to Vigil on a periodic basis.

You can read more on Vigil Local on its repository, and follow the setup instructions.

:children_crossing: Troubleshoot Issues

ICMP replicas always report as `dead`

On Linux systems, non-priviledge users cannot create raw sockets, which Vigil ICMP probing system requires. It means that, by default, all ICMP probe attempts will fail silently, as if the host being probed was always down.

This can easily be fixed by allowing Vigil to create raw sockets:

setcap 'cap_net_raw+ep' /bin/vigil

Note that HTTP and TCP probes do not require those raw socket capabilities.

:fire: Report A Vulnerability

If you find a vulnerability in Vigil, you are more than welcome to report it directly to @valeriansaliou by sending an encrypted email to valerian@valeriansaliou.name. Do not report vulnerabilities in public GitHub issues, as they may be exploited by malicious people to target production servers running an unpatched Vigil server.

:warning: You must encrypt your email using @valeriansaliou GPG public key: :key:valeriansaliou.gpg.pub.asc.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of Grafana

Cons of Grafana

Code Comparison

Summary

Pros of Prometheus

Cons of Prometheus

Code Comparison

Summary

Pros of Healthchecks

Cons of Healthchecks

Code Comparison

Pros of Uptime Kuma

Cons of Uptime Kuma

Code Comparison

Summary

Pros of Statping

Cons of Statping

Code Comparison

Convert designs to code with AI

README

Vigil

Who uses it?

Features

How does it work?

Hosted alternative to Vigil

How to use it?

Installation

Configuration

Run Vigil

Usage recommendations

What status variants look like?

Healthy status variant

Sick status variant

Dead status variant

What do announcements look like?

What do alerts look like?

What do Webhook payloads look like?

How can I create script probes?

How can I integrate Vigil Reporter in my code?

Vigil Reporter Libraries

Vigil Reporter HTTP API

How can I administrate Vigil through Vigil Manager?

Vigil Manager HTTP API

How can I monitor services on a different LAN using Vigil Local?

:children_crossing: Troubleshoot Issues

ICMP replicas always report as dead

:fire: Report A Vulnerability

Top Related Projects

Convert designs to code with AI

ICMP replicas always report as `dead`