awesome-sre

A curated list of Site Reliability and Production Engineering resources.

12,464

1,643

12,464

View on GitHub

Top Related Projects

howtheysre

9,376

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

awesome-incident-response

8,236

A curated list of tools for incident response

awesome-scalability

61,633

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

awesome-sysadmin

23,991

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

awesome-chaos-engineering

6,302

A curated list of Chaos Engineering resources.

Quick Overview

The "awesome-sre" repository is a curated list of Site Reliability Engineering (SRE) resources. It serves as a comprehensive collection of articles, books, videos, tools, and other materials related to SRE practices, principles, and methodologies. This repository aims to be a valuable reference for both beginners and experienced professionals in the field of SRE.

Pros

Extensive collection of SRE resources covering various topics and skill levels
Regularly updated with new content and contributions from the community
Well-organized structure, making it easy to find specific information
Includes both theoretical resources and practical tools for SRE implementation

Cons

May be overwhelming for beginners due to the large amount of information
Some links may become outdated over time if not regularly maintained
Lacks detailed explanations or summaries for each resource
May not cover all emerging trends or cutting-edge practices in real-time

Getting Started

To use the awesome-sre repository:

Visit the GitHub repository: https://github.com/dastergon/awesome-sre
Browse through the table of contents to find topics of interest
Click on the links to access the resources
Consider starring the repository to stay updated with new additions
If you want to contribute, follow the contribution guidelines in the README

Note: This is not a code library, so there are no code examples or specific installation instructions. The repository serves as a reference and collection of resources for SRE professionals and enthusiasts.

Competitor Comparisons

howtheysre

9,376

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Pros of howtheysre

Focuses specifically on real-world SRE practices from various companies
Provides detailed case studies and implementation examples
Regularly updated with new company insights

Cons of howtheysre

More limited in scope compared to awesome-sre's comprehensive resource list
Less variety in content types (primarily focuses on company-specific practices)
May not cover as many general SRE concepts and tools

Code comparison

Not applicable for these repositories, as they primarily consist of curated lists and documentation rather than code samples.

Summary

howtheysre offers in-depth insights into how specific companies implement SRE practices, making it valuable for those seeking real-world examples. It's regularly updated but has a narrower focus compared to awesome-sre.

awesome-sre provides a more comprehensive list of SRE resources, tools, and concepts, making it a better starting point for those looking to explore the field broadly. However, it may not offer as much detail on company-specific implementations.

Both repositories serve different purposes within the SRE ecosystem. howtheysre is ideal for understanding practical applications, while awesome-sre is better for discovering a wide range of SRE-related resources and tools.

awesome-incident-response

8,236

A curated list of tools for incident response

Pros of awesome-incident-response

More focused on specific incident response tools and resources
Includes sections on memory analysis and disk forensics
Provides a curated list of books and training courses for incident response

Cons of awesome-incident-response

Less comprehensive coverage of general SRE practices and principles
Fewer resources for monitoring, observability, and performance optimization
Limited information on capacity planning and scalability

Code comparison

While both repositories are primarily curated lists of resources, they don't contain significant code samples. However, here's a comparison of their README structures:

awesome-incident-response:

# Awesome Incident Response

A curated list of tools and resources for security incident response, aimed to help security analysts and [DFIR](http://www.acronymfinder.com/Digital-Forensics%2c-Incident-Response-(DFIR).html) teams.

- [Awesome Incident Response](#awesome-incident-response)
    - [Incident Response](#incident-response)
    - [Disk Image Creation Tools](#disk-image-creation-tools)
    - [Memory Analysis Tools](#memory-analysis-tools)

awesome-sre:

# Awesome Site Reliability Engineering

A curated list of Site Reliability and Production Engineering resources.

#### What is Site Reliability Engineering?
> "Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

## Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)

Both repositories serve as valuable resources for their respective domains, with awesome-incident-response being more specialized in security incident handling, while awesome-sre covers a broader range of SRE topics.

awesome-scalability

61,633

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Pros of awesome-scalability

Broader focus on scalability concepts beyond just SRE practices
Includes more visual content like diagrams and infographics
Offers a comprehensive list of system design interview resources

Cons of awesome-scalability

Less frequently updated compared to awesome-sre
Fewer community contributions and engagement
More general in scope, potentially less depth in specific SRE topics

Code comparison

While both repositories are primarily curated lists without significant code content, here's a comparison of their README structures:

awesome-scalability:

## Table of Contents
- [Scalability](#scalability)
- [System Design](#system-design)
- [Distributed Systems](#distributed-systems)

awesome-sre:

## Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)

Both repositories use similar Markdown structures, but awesome-sre has a more detailed and SRE-specific table of contents, while awesome-scalability covers broader topics related to system scalability and design.

awesome-sysadmin

23,991

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

Pros of awesome-sysadmin

Broader scope covering general system administration topics
More extensive list of tools and resources
Includes categories like Backups, CMDB, and IT Asset Management

Cons of awesome-sysadmin

Less focused on modern DevOps and SRE practices
May include outdated or less relevant tools for current industry trends
Lacks specific sections on observability and incident management

Code comparison

While both repositories are primarily curated lists without significant code, they differ in their organization and structure:

awesome-sysadmin:

## Backups

*Backup software.*

- [Amanda](https://www.amanda.org/) - Client-server model backup tool.
- [Bacula](https://www.bacula.org) - Another Client-server model backup tool.

awesome-sre:

## Reliability

- [Availability Table](https://github.com/dastergon/availability-table) - Table of availability percentages and corresponding downtime.
- [Reliable Product](https://github.com/lyst/MakingLyst/tree/master/reliable-product) - Article series on building reliable products.

The awesome-sre repository focuses more on concepts and practices, while awesome-sysadmin emphasizes specific tools and software categories.

awesome-chaos-engineering

6,302

A curated list of Chaos Engineering resources.

Pros of awesome-chaos-engineering

More focused and specialized content specifically for chaos engineering practices
Includes a dedicated section on chaos engineering tools and platforms
Provides resources for chaos engineering in specific environments (e.g., Kubernetes, cloud)

Cons of awesome-chaos-engineering

Smaller overall collection of resources compared to awesome-sre
Less coverage of general SRE practices and principles
May not be as relevant for those seeking broader SRE knowledge

Code comparison

While both repositories are curated lists and don't contain actual code, we can compare their structure:

awesome-chaos-engineering:

## Table of Contents
- [Culture](#culture)
- [Books](#books)
- [Education](#education)
- [Notable Tools](#notable-tools)

awesome-sre:

## Table of Contents
- [Culture](#culture)
- [Education](#education)
- [Books](#books)
- [Hiring](#hiring)
- [Reliability](#reliability)

Both repositories use a similar structure with markdown formatting, but awesome-sre has a broader range of topics covered in its table of contents, reflecting its wider scope in the SRE field.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Awesome Site Reliability Engineering

A curated list of awesome Site Reliability and Production Engineering resources.

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

Culture
Education
Books
Hiring
Reliability
Monitoring & Observability & Alerting
On-Call
Post-Mortem
Capacity Planning
Service Level Agreement
Performance
Programming
Misc Articles
Real-time Messaging
Blogs
Newsletters
Conferences & Meetups
Twitter
SRE Tools
SRE Podcasts

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

#sre channel at Hangops Slack - Discussion of Site Reliability Engineering generally.
#incident_response channel at Hangops Slack - Discussion about Incident Response.
USENIX SREcon Slack

Blogs

Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
High Scalability - Technical Blog Posts About Systems Architecture.
rachelbythebay - Techincal Blog Posts.
Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
SysAdvent - One article for each day of December, ending on the 25th article.
Stephen Thorne's Blog - Blog Posts About SRE
Increment - A digital magazine about how teams build and operate software systems at scale.
GopherSRE - Blog Posts about Go and SRE.
Cindy Sridharan - Blog posts about distributed systems and their management.
Blameless Blog - Blog posts about SRE culture and practices.
Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
Rootly Blog - Incident management best practices and guides.
incident.io Blog - Guides, advice and resources on incident management and response.
Logit.io Blog - Resources on log management, SRE and devOps.

Newsletters

DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
SRE Weekly - Weekly Site Reliability Newsletter.
OâReilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!
Monitoring Weekly - What's new in monitoring? Curated monitoring articles to your inbox each week.
Observability news - Updates around observability (o11y) with a special focus on open source.

Conferences & Meetups

SRECon Conferences - The Official SRE Conference.
LISA Conferences - Prominent Conference About SysAdmin/DevOps/SRE.
SRE Tech Talks - SRE Talks Hosted by Google.
South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup - A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.
San Francisco Reliability Engineering - A Group Of People Who Are Passionate About Reliable, Performant Software Systems.
Site Reliability Engineering Munich, Germany - SRE Meetup in the greater area of Oktoberfest city.
ADDO - All Day DevOps - A 24 hour conference that is completely online and free.
Site Reliability Engineering Paris, France - SRE Meetup in the city of light.
Site Reliability Engineering India - SRE Meetup India

Twitter

Google SRE Twitter Account - Google's SRE Twitter Account.
SREBook - The Official Twitter Account of Site Reliability Engineering Book.
SREcon - SRECon's Official Twitter Account.
SREWorkbook - The Official Twitter Account of Site Reliability Workbook.
The SRE Dev - SRE-related Posts from dev.to.
Twitter SRE - The Official Twitter Account of Twitter's SRE team.
Twitter SRE Weekly - The Official Twitter Account of SRE Weekly Newsletter.
USENIX Association - The Official USENIX Twitter Account.

SRE Tools

Awesome SRE Tools - A curated list of Site Reliability and Production Engineering tools
List of Continuous Integration services
SRE cheat sheet - A cheat sheet for Site Reliability Engineering principles and numbers

Podcasts

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot