Convert Figma logo to code with AI

dastergon logoawesome-sre

A curated list of Site Reliability and Production Engineering resources.

11,839
1,564
11,839
41

Top Related Projects

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

A curated list of tools for incident response

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

A curated list of Chaos Engineering resources.

Quick Overview

The "awesome-sre" repository is a curated list of Site Reliability Engineering (SRE) resources. It serves as a comprehensive collection of articles, books, videos, tools, and other materials related to SRE practices, principles, and methodologies. This repository aims to be a valuable reference for both beginners and experienced professionals in the field of SRE.

Pros

  • Extensive collection of SRE resources covering various topics and skill levels
  • Regularly updated with new content and contributions from the community
  • Well-organized structure, making it easy to find specific information
  • Includes both theoretical resources and practical tools for SRE implementation

Cons

  • May be overwhelming for beginners due to the large amount of information
  • Some links may become outdated over time if not regularly maintained
  • Lacks detailed explanations or summaries for each resource
  • May not cover all emerging trends or cutting-edge practices in real-time

Getting Started

To use the awesome-sre repository:

  1. Visit the GitHub repository: https://github.com/dastergon/awesome-sre
  2. Browse through the table of contents to find topics of interest
  3. Click on the links to access the resources
  4. Consider starring the repository to stay updated with new additions
  5. If you want to contribute, follow the contribution guidelines in the README

Note: This is not a code library, so there are no code examples or specific installation instructions. The repository serves as a reference and collection of resources for SRE professionals and enthusiasts.

Competitor Comparisons

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Pros of howtheysre

  • Focuses specifically on real-world SRE practices from various companies
  • Provides detailed case studies and implementation examples
  • Regularly updated with new company insights

Cons of howtheysre

  • More limited in scope compared to awesome-sre's comprehensive resource list
  • Less variety in content types (primarily focuses on company-specific practices)
  • May not cover as many general SRE concepts and tools

Code comparison

Not applicable for these repositories, as they primarily consist of curated lists and documentation rather than code samples.

Summary

howtheysre offers in-depth insights into how specific companies implement SRE practices, making it valuable for those seeking real-world examples. It's regularly updated but has a narrower focus compared to awesome-sre.

awesome-sre provides a more comprehensive list of SRE resources, tools, and concepts, making it a better starting point for those looking to explore the field broadly. However, it may not offer as much detail on company-specific implementations.

Both repositories serve different purposes within the SRE ecosystem. howtheysre is ideal for understanding practical applications, while awesome-sre is better for discovering a wide range of SRE-related resources and tools.

A curated list of tools for incident response

Pros of awesome-incident-response

  • More focused on specific incident response tools and resources
  • Includes sections on memory analysis and disk forensics
  • Provides a curated list of books and training courses for incident response

Cons of awesome-incident-response

  • Less comprehensive coverage of general SRE practices and principles
  • Fewer resources for monitoring, observability, and performance optimization
  • Limited information on capacity planning and scalability

Code comparison

While both repositories are primarily curated lists of resources, they don't contain significant code samples. However, here's a comparison of their README structures:

awesome-incident-response:

# Awesome Incident Response

A curated list of tools and resources for security incident response, aimed to help security analysts and [DFIR](http://www.acronymfinder.com/Digital-Forensics%2c-Incident-Response-(DFIR).html) teams.

- [Awesome Incident Response](#awesome-incident-response)
    - [Incident Response](#incident-response)
    - [Disk Image Creation Tools](#disk-image-creation-tools)
    - [Memory Analysis Tools](#memory-analysis-tools)

awesome-sre:

# Awesome Site Reliability Engineering

A curated list of Site Reliability and Production Engineering resources.

#### What is Site Reliability Engineering?
> "Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

## Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)

Both repositories serve as valuable resources for their respective domains, with awesome-incident-response being more specialized in security incident handling, while awesome-sre covers a broader range of SRE topics.

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Pros of awesome-scalability

  • Broader focus on scalability concepts beyond just SRE practices
  • Includes more visual content like diagrams and infographics
  • Offers a comprehensive list of system design interview resources

Cons of awesome-scalability

  • Less frequently updated compared to awesome-sre
  • Fewer community contributions and engagement
  • More general in scope, potentially less depth in specific SRE topics

Code comparison

While both repositories are primarily curated lists without significant code content, here's a comparison of their README structures:

awesome-scalability:

## Table of Contents
- [Scalability](#scalability)
- [System Design](#system-design)
- [Distributed Systems](#distributed-systems)

awesome-sre:

## Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)

Both repositories use similar Markdown structures, but awesome-sre has a more detailed and SRE-specific table of contents, while awesome-scalability covers broader topics related to system scalability and design.

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

Pros of awesome-sysadmin

  • Broader scope covering general system administration topics
  • More extensive list of tools and resources
  • Includes categories like Backups, CMDB, and IT Asset Management

Cons of awesome-sysadmin

  • Less focused on modern DevOps and SRE practices
  • May include outdated or less relevant tools for current industry trends
  • Lacks specific sections on observability and incident management

Code comparison

While both repositories are primarily curated lists without significant code, they differ in their organization and structure:

awesome-sysadmin:

## Backups

*Backup software.*

- [Amanda](https://www.amanda.org/) - Client-server model backup tool.
- [Bacula](https://www.bacula.org) - Another Client-server model backup tool.

awesome-sre:

## Reliability

- [Availability Table](https://github.com/dastergon/availability-table) - Table of availability percentages and corresponding downtime.
- [Reliable Product](https://github.com/lyst/MakingLyst/tree/master/reliable-product) - Article series on building reliable products.

The awesome-sre repository focuses more on concepts and practices, while awesome-sysadmin emphasizes specific tools and software categories.

A curated list of Chaos Engineering resources.

Pros of awesome-chaos-engineering

  • More focused and specialized content specifically for chaos engineering practices
  • Includes a dedicated section on chaos engineering tools and platforms
  • Provides resources for chaos engineering in specific environments (e.g., Kubernetes, cloud)

Cons of awesome-chaos-engineering

  • Smaller overall collection of resources compared to awesome-sre
  • Less coverage of general SRE practices and principles
  • May not be as relevant for those seeking broader SRE knowledge

Code comparison

While both repositories are curated lists and don't contain actual code, we can compare their structure:

awesome-chaos-engineering:

## Table of Contents
- [Culture](#culture)
- [Books](#books)
- [Education](#education)
- [Notable Tools](#notable-tools)

awesome-sre:

## Table of Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)
- [Reliability](#reliability)

Both repositories use a similar structure with markdown formatting, but awesome-sre has a broader range of topics covered in its table of contents, reflecting its wider scope in the SRE field.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Awesome Site Reliability Engineering Awesome

A curated list of awesome Site Reliability and Production Engineering resources.

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

Contents

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

  • Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  • High Scalability - Technical Blog Posts About Systems Architecture.
  • rachelbythebay - Techincal Blog Posts.
  • Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
  • SysAdvent - One article for each day of December, ending on the 25th article.
  • Stephen Thorne's Blog - Blog Posts About SRE
  • Increment - A digital magazine about how teams build and operate software systems at scale.
  • GopherSRE - Blog Posts about Go and SRE.
  • Cindy Sridharan - Blog posts about distributed systems and their management.
  • Blameless Blog - Blog posts about SRE culture and practices.
  • Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
  • FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
  • Rootly Blog - Incident management best practices and guides.
  • incident.io Blog - Guides, advice and resources on incident management and response.
  • Logit.io Blog - Resources on log management, SRE and devOps.

Newsletters

  • DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • SRE Weekly - Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
  • ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!
  • Monitoring Weekly - What's new in monitoring? Curated monitoring articles to your inbox each week.
  • Observability news - Updates around observability (o11y) with a special focus on open source.

Conferences & Meetups

Twitter

SRE Tools

Podcasts