cortex
A horizontally scalable, highly available, multi-tenant, long term Prometheus.
Top Related Projects
Like Prometheus, but for logs.
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
The Prometheus monitoring system and time series database.
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Scalable datastore for metrics, events, and real-time analytics
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Quick Overview
Cortex is an open-source, horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus. It provides a complete implementation of the Prometheus API and allows users to query metrics from multiple Prometheus servers in a single place, with long-term storage capabilities.
Pros
- Highly scalable and designed for cloud-native environments
- Multi-tenant architecture for efficient resource sharing
- Compatible with existing Prometheus ecosystems and tools
- Supports long-term storage of metrics data
Cons
- Complex setup and configuration compared to vanilla Prometheus
- Requires additional infrastructure and resources to run effectively
- Steeper learning curve for teams new to distributed systems
- May introduce additional latency in some query scenarios
Getting Started
To get started with Cortex, follow these steps:
- Install Cortex:
go get github.com/cortexproject/cortex/cmd/cortex
- Create a basic configuration file
config.yaml
:
auth_enabled: false
server:
http_listen_port: 9009
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s
storage:
engine: blocks
blocks_storage:
backend: filesystem
filesystem:
dir: /tmp/cortex/blocks
- Run Cortex:
cortex -config.file=config.yaml
- Configure Prometheus to remote write to Cortex:
remote_write:
- url: http://localhost:9009/api/v1/push
- Query metrics using the Cortex API or compatible tools like Grafana.
For more detailed setup and advanced configurations, refer to the official Cortex documentation.
Competitor Comparisons
Like Prometheus, but for logs.
Pros of Loki
- Simpler architecture and easier to set up
- More efficient storage for log data
- Better integration with Grafana dashboards
Cons of Loki
- Limited query language compared to PromQL
- Less mature ecosystem and community support
- Fewer built-in features for advanced use cases
Code Comparison
Loki query example:
{app="myapp"} |= "error" | json | rate[5m]
Cortex query example (using PromQL):
rate(http_requests_total{status="500"}[5m])
Both Loki and Cortex are designed for observability, but they focus on different aspects. Loki is primarily for log aggregation and analysis, while Cortex is for metrics storage and querying. Loki's design prioritizes efficiency in storing and querying log data, making it more suitable for organizations with large volumes of logs. Cortex, on the other hand, offers a more comprehensive metrics solution with full PromQL support and advanced features like horizontal scalability and long-term storage.
The choice between Loki and Cortex depends on specific use cases and requirements. Organizations primarily focused on log management may prefer Loki, while those needing a robust metrics solution might opt for Cortex. Some teams use both in conjunction to cover both logs and metrics effectively.
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
Pros of Thanos
- Simpler architecture, easier to set up and maintain
- Better support for long-term storage and historical data querying
- More flexible deployment options, including sidecar and receive components
Cons of Thanos
- Less efficient for real-time querying of recent data
- Higher storage requirements due to object storage approach
- Limited multi-tenancy support compared to Cortex
Code Comparison
Thanos query example:
thanos_query:
image: thanosio/thanos:v0.24.0
command:
- query
- --store=thanos-store-gateway:10901
- --query.replica-label=replica
Cortex query example:
cortex_query:
image: cortexproject/cortex:v1.10.0
command:
- -config.file=/etc/cortex/config.yaml
- -target=query-frontend
Both projects aim to provide scalable, long-term storage solutions for Prometheus metrics. Thanos focuses on simplicity and historical data querying, while Cortex offers more advanced features like multi-tenancy and real-time querying. The choice between them depends on specific use cases and infrastructure requirements.
The Prometheus monitoring system and time series database.
Pros of Prometheus
- Simpler architecture and easier to set up for small to medium-scale deployments
- Native support for a wide range of service discovery mechanisms
- More mature and battle-tested in production environments
Cons of Prometheus
- Limited scalability for large-scale, multi-tenant environments
- Lacks built-in long-term storage solutions
- No native support for high availability and horizontal scaling
Code Comparison
Prometheus configuration (prometheus.yml):
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'example'
static_configs:
- targets: ['localhost:8080']
Cortex configuration (cortex.yaml):
ingester:
lifecycler:
ring:
kvstore:
store: consul
storage:
engine: blocks
blocks:
backend: s3
The Prometheus configuration focuses on scrape targets and intervals, while Cortex configuration emphasizes distributed components and storage backends. Cortex's configuration is more complex due to its distributed nature and support for multi-tenancy and horizontal scaling.
Prometheus is ideal for single-organization monitoring with moderate scale, while Cortex excels in large-scale, multi-tenant environments requiring high availability and horizontal scalability. Cortex builds upon Prometheus' strengths while addressing its limitations in scalability and long-term storage.
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Pros of VictoriaMetrics
- Simpler architecture and easier to deploy
- Lower resource consumption and better performance
- Built-in multi-tenancy support
Cons of VictoriaMetrics
- Less mature ecosystem and community support
- Fewer advanced features compared to Cortex
- Limited horizontal scalability for write-intensive workloads
Code Comparison
VictoriaMetrics configuration:
storageDataPath: /storage
httpListenAddr: :8428
retentionPeriod: 1
Cortex configuration:
storage:
engine: blocks
blocks:
backend: s3
ingester:
lifecycler:
ring:
kvstore:
store: consul
VictoriaMetrics offers a simpler configuration, while Cortex provides more granular control over its components. VictoriaMetrics is designed for ease of use and performance, making it suitable for smaller to medium-sized deployments. Cortex, on the other hand, offers a more feature-rich and scalable solution, better suited for large-scale, multi-tenant environments with complex requirements.
Both projects aim to provide long-term storage and querying capabilities for time-series data, but they take different approaches. VictoriaMetrics focuses on simplicity and performance, while Cortex emphasizes scalability and advanced features. The choice between the two depends on specific use cases, scale requirements, and desired level of complexity in deployment and management.
Scalable datastore for metrics, events, and real-time analytics
Pros of InfluxDB
- Purpose-built for time-series data, offering optimized storage and querying
- Includes a powerful query language (InfluxQL) specifically designed for time-series analysis
- Provides built-in data retention policies and continuous queries for automated data management
Cons of InfluxDB
- Less flexible for multi-tenant environments compared to Cortex
- May require more manual configuration for high availability and horizontal scaling
- Limited support for long-term storage of high-cardinality data
Code Comparison
InfluxDB query example:
SELECT mean("value") FROM "cpu_usage"
WHERE "host" = 'server01' AND time >= now() - 1h
GROUP BY time(5m)
Cortex (using PromQL) query example:
avg(rate(cpu_usage{host="server01"}[5m])) by (host)
Both examples show querying CPU usage data, but InfluxDB uses its custom SQL-like syntax, while Cortex uses PromQL, which is more compact and designed for time-series operations.
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Pros of TimescaleDB
- Built on PostgreSQL, offering familiar SQL interface and ecosystem compatibility
- Optimized for time-series data with automatic partitioning and indexing
- Supports both relational and time-series data in a single database
Cons of TimescaleDB
- Requires more storage space due to its relational database structure
- May have higher query latency for certain time-series operations compared to purpose-built TSDB
Code Comparison
TimescaleDB (SQL-based):
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id TEXT,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT time_bucket('1 hour', time) AS hour,
avg(temperature) AS avg_temp
FROM metrics
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
Cortex (PromQL-based):
rate(http_requests_total[5m])
sum by (job) (
rate(http_requests_total[5m])
)
While TimescaleDB uses standard SQL with time-series extensions, Cortex relies on PromQL for querying metrics data. TimescaleDB offers more flexibility for complex queries and joins, while Cortex provides a simpler, metrics-focused query language.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Cortex
Cortex is a horizontally scalable, highly available, multi-tenant, long term storage for Prometheus.
Features
- Horizontally scalable: Cortex can run across multiple machines in a cluster, exceeding the throughput and storage of a single machine.
- Highly available: When run in a cluster, Cortex can replicate data between machines.
- Multi-tenant: Cortex can isolate data and queries from multiple different independent Prometheus sources in a single cluster.
- Long term storage: Cortex supports S3, GCS, Swift and Microsoft Azure for long term storage of metric data.
Getting Started
To get started with Cortex, follow these steps:
Documentation
Community and Support
If you have any questions about Cortex, you can:
- Ask a question on the Cortex Slack channel. To invite yourself to the CNCF Slack, visit http://slack.cncf.io/.
- File an issue.
- Email cortex-users@lists.cncf.io.
Your feedback is always welcome.
For security issues see https://github.com/cortexproject/cortex/security/policy
Engage with Our Community
We invite you to participate in the bi-weekly Cortex Community Calls, an exciting opportunity to connect with fellow developers and enthusiasts. These meetings are held every alternate Thursday, alternating between 1200 UTC and 1700 UTC, providing a platform for open discussion, collaboration, and knowledge sharing.
Our meeting notes are meticulously documented and can be accessed here, offering a comprehensive overview of the topics discussed and decisions made.
To ensure you never miss a meeting, we've made it easy for you to keep track:
- View the Cortex Community Call schedule in your browser here.
- Alternatively, download the .ics file here for use with any calendar application or service that supports the iCal format.
Join us in shaping the future of Cortex, and let's build something amazing together!
Further reading
Talks
- Mar 2024 KubeCon talk "Cortex Intro: Multi-Tenant Scalable Prometheus" (video, slides)
- Apr 2023 KubeCon talk "How to Run a Rock Solid Multi-Tenant Prometheus" (video, slides)
- Oct 2022 KubeCon talk "Current State and the Future of Cortex" (video, slides)
- Oct 2021 KubeCon talk "Cortex: Intro and Production Tips" (video, slides)
- Sep 2020 KubeCon talk "Scaling Prometheus: How We Got Some Thanos Into Cortex" (video, slides)
- Jul 2020 PromCon talk "Sharing is Caring: Leveraging Open Source to Improve Cortex & Thanos" (video, slides)
- Nov 2019 KubeCon talks "Cortex 101: Horizontally Scalable Long Term Storage for Prometheus" (video, slides), "Configuring Cortex for Max Performance" (video, slides, write up) and "Blazinâ Fast PromQL" (slides, video, write up)
- Nov 2019 PromCon talk "Two Households, Both Alike in Dignity: Cortex and Thanos" (video, slides, write up)
- May 2019 KubeCon talks; "Cortex: Intro" (video, slides, blog post) and "Cortex: Deep Dive" (video, slides)
- Nov 2018 CloudNative London meetup talk; "Cortex: Horizontally Scalable, Highly Available Prometheus" (slides)
- Aug 2018 PromCon panel; "Prometheus Long-Term Storage Approaches" (video)
- Dec 2018 KubeCon talk; "Cortex: Infinitely Scalable Prometheus" (video, slides)
- Aug 2017 PromCon talk; "Cortex: Prometheus as a Service, One Year On" (videos, slides, write up part 1, part 2, part 3)
- Jun 2017 Prometheus London meetup talk; "Cortex: open-source, horizontally-scalable, distributed Prometheus" (video)
- Dec 2016 KubeCon talk; "Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service" (video, slides)
- Aug 2016 PromCon talk; "Project Frankenstein: Multitenant, Scale-Out Prometheus": (video, slides)
Blog Posts
- Dec 2020 blog post "How AWS and Grafana Labs are scaling Cortex for the cloud"
- Oct 2020 blog post "How to switch Cortex from chunks to blocks storage (and why you wonât look back)"
- Oct 2020 blog post "Now GA: Cortex blocks storage for running Prometheus at scale with reduced operational complexity"
- Sep 2020 blog post "A Tale of Tail Latencies"
- Aug 2020 blog post "Scaling Prometheus: How weâre pushing Cortex blocks storage to its limit and beyond"
- Jul 2020 blog post "How blocks storage in Cortex reduces operational complexity for running Prometheus at massive scale"
- Mar 2020 blog post "Cortex: Zone Aware Replication"
- Mar 2020 blog post "How we're using gossip to improve Cortex and Loki availability"
- Jan 2020 blog post "The Future of Cortex: Into the Next Decade"
- Feb 2019 blog post & podcast; "Prometheus Scalability with Bryan Boreham" (podcast)
- Feb 2019 blog post; "How Aspen Mesh Runs Cortex in Production"
- Dec 2018 CNCF blog post; "Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service"
- Nov 2018 CNCF TOC Presentation; "Horizontally Scalable, Multi-tenant Prometheus" (slides)
- Sept 2018 blog post; "What is Cortex?"
- Jul 2018 design doc; "Cortex Query Optimisations"
- Jun 2016 design document; "Project Frankenstein: A Multi Tenant, Scale Out Prometheus"
Hosted Cortex
There are several commercial services where you can use Cortex on-demand:
Amazon Managed Service for Prometheus (AMP)
Amazon Managed Service for Prometheus (AMP) is a Prometheus-compatible monitoring service that makes it easy to monitor containerized applications at scale. It is a highly available, secure, and managed monitoring for your containers. Get started here. To learn more about the AMP, reference our documentation and Getting Started with AMP blog.
Emeritus Maintainers
- Peter Štibraný @pstibrany
- Marco Pracucci @pracucci
- Bryan Boreham @bboreham
- Goutham Veeramachaneni @gouthamve
- Jacob Lisi @jtlisi
- Tom Wilkie @tomwilkie
- Alvin Lin @alvinlin123
History of Cortex
The Cortex project was started by Tom Wilkie and Julius Volz (Prometheus' co-founder) in June 2016.
Top Related Projects
Like Prometheus, but for logs.
Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
The Prometheus monitoring system and time series database.
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Scalable datastore for metrics, events, and real-time analytics
An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot