howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

9,376

820

9,376

View on GitHub

Top Related Projects

awesome-sre

12,464

A curated list of Site Reliability and Production Engineering resources.

sre-interview-prep-guide

8,056

Site Reliability Engineer Interview Preparation Guide

devops-exercises

76,930

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

awesome-scalability

61,633

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

awesome-sysadmin

23,991

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

test-your-sysadmin-skills

11,103

A collection of Linux Sysadmin Test Questions and Answers. Test your knowledge and skills in different fields with these Q/A.

Quick Overview

"How They SRE" is a curated collection of publicly available resources on how technology and tech-driven companies around the world practice Site Reliability Engineering (SRE). It serves as a comprehensive knowledge base for SRE practices, tools, and methodologies used by various organizations.

Pros

Extensive collection of SRE resources from numerous companies
Well-organized and categorized information for easy navigation
Regularly updated with new content and resources
Open-source project allowing community contributions

Cons

May contain outdated information if not frequently maintained
Lacks in-depth analysis or comparison of different SRE practices
Primarily focuses on large tech companies, potentially overlooking smaller organizations' approaches
Relies on publicly available information, which may not always reflect the most current practices

Competitor Comparisons

awesome-sre

12,464

A curated list of Site Reliability and Production Engineering resources.

Pros of awesome-sre

More comprehensive and diverse collection of SRE resources
Better organized with clear categories and subcategories
Regularly updated with new content and contributions

Cons of awesome-sre

Less focused on specific company practices and implementations
May be overwhelming for beginners due to the sheer volume of information
Lacks detailed explanations or summaries for each resource

Code comparison

Not applicable for these repositories, as they are primarily curated lists of resources rather than code repositories.

Summary

awesome-sre is a more extensive and well-organized collection of SRE resources, covering a wide range of topics and categories. It's regularly updated and maintained, making it a valuable reference for SRE professionals at all levels.

howtheysre, on the other hand, focuses specifically on how different companies implement SRE practices. It provides a more targeted approach, offering insights into real-world applications of SRE principles.

While awesome-sre offers a broader scope and more resources, howtheysre provides a deeper dive into specific company implementations. The choice between the two depends on whether you're looking for a comprehensive resource list or practical examples from industry leaders.

sre-interview-prep-guide

8,056

Site Reliability Engineer Interview Preparation Guide

Pros of sre-interview-prep-guide

More focused on interview preparation with specific questions and answers
Includes a wider range of topics, including networking and databases
Provides links to additional resources for further learning

Cons of sre-interview-prep-guide

Less comprehensive in terms of real-world SRE practices
Not regularly updated, potentially containing outdated information
Lacks detailed explanations of concepts, focusing more on quick answers

Code Comparison

While both repositories primarily contain documentation and resources rather than code, sre-interview-prep-guide includes some sample code snippets for interview questions. For example:

# sre-interview-prep-guide
def is_palindrome(s):
    return s == s[::-1]

howtheysre doesn't contain code snippets, focusing instead on curating information about SRE practices across different companies.

Both repositories serve different purposes:

howtheysre is a comprehensive collection of SRE practices from various companies, providing insights into real-world implementations.
sre-interview-prep-guide is tailored for interview preparation, offering a quick reference for common SRE interview questions and topics.

Ultimately, the choice between these repositories depends on whether you're looking for industry practices (howtheysre) or interview preparation (sre-interview-prep-guide).

devops-exercises

76,930

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Pros of devops-exercises

Broader scope covering various DevOps topics, not limited to SRE
More interactive with exercises and questions for hands-on learning
Regularly updated with new content and contributions

Cons of devops-exercises

Less focused on specific company practices and real-world implementations
May be overwhelming for beginners due to the wide range of topics covered
Lacks detailed explanations of SRE-specific concepts and methodologies

Code Comparison

devops-exercises:

def is_palindrome(s):
    return s == s[::-1]

print(is_palindrome("radar"))  # True
print(is_palindrome("hello"))  # False

howtheysre:

- company: Google
  resources:
    - title: "Google SRE Book"
      url: "https://sre.google/sre-book/table-of-contents/"
    - title: "Google SRE Workbook"
      url: "https://sre.google/workbook/table-of-contents/"

The code comparison highlights the different focus areas of the repositories. devops-exercises provides practical coding examples and exercises, while howtheysre primarily contains structured information about SRE practices in various companies.

awesome-scalability

61,633

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Pros of awesome-scalability

Broader focus on scalability concepts and techniques beyond just SRE practices
More comprehensive coverage of distributed systems, databases, and cloud technologies
Includes practical system design examples and case studies

Cons of awesome-scalability

Less specific to SRE roles and responsibilities
May not provide as much depth on incident management and reliability practices
Content organization is less structured compared to howtheysre

Code comparison

While both repositories primarily consist of curated lists and don't contain significant code samples, here's a comparison of their README structures:

awesome-scalability:

## Scalability
- Principles
- Scalability Articles
- ...

## Distributed Systems
- Distributed Systems Theory
- ...

howtheysre:

## Companies

### Company Name
- Blog
- Videos
- Podcasts
- ...

The awesome-scalability repository focuses on categorizing topics, while howtheysre organizes content by company, making it easier to explore specific organizations' SRE practices.

awesome-sysadmin

23,991

A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.

Pros of awesome-sysadmin

Broader scope covering various aspects of system administration
More comprehensive list of tools and resources
Regularly updated with community contributions

Cons of awesome-sysadmin

Less focused on specific SRE practices and methodologies
May be overwhelming for beginners due to the extensive list of resources
Lacks detailed explanations or case studies

Code comparison

Not applicable, as both repositories are curated lists without significant code content.

Summary

awesome-sysadmin is a comprehensive resource for system administrators, covering a wide range of topics and tools. It offers a vast collection of resources but may lack the specific focus on SRE practices found in howtheysre.

howtheysre, on the other hand, provides a more targeted approach to SRE practices, offering insights into how different companies implement SRE. It may be more beneficial for those specifically interested in SRE methodologies and real-world applications.

Both repositories serve different purposes and can be valuable depending on the user's needs. awesome-sysadmin is better suited for general system administration knowledge, while howtheysre is more appropriate for those looking to understand and implement SRE practices in their organizations.

test-your-sysadmin-skills

11,103

A collection of Linux Sysadmin Test Questions and Answers. Test your knowledge and skills in different fields with these Q/A.

Pros of test-your-sysadmin-skills

Provides a comprehensive set of questions and exercises for sysadmins to test their skills
Covers a wide range of topics including Linux, networking, security, and DevOps
Includes practical scenarios and real-world examples

Cons of test-your-sysadmin-skills

Focuses primarily on technical skills rather than broader SRE practices and methodologies
May not be as up-to-date with the latest industry trends and technologies
Lacks information on specific company practices and experiences

Code comparison

test-your-sysadmin-skills:

#!/usr/bin/env bash

# Function to check if a command exists
command_exists() {
    command -v "$1" >/dev/null 2>&1
}

howtheysre:

# How They SRE

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE).

## Table of Contents

- [Adopting SRE](#adopting-sre)

The code comparison shows that test-your-sysadmin-skills includes practical bash scripts, while howtheysre is primarily a curated collection of resources in Markdown format. This reflects the different focus of each repository, with test-your-sysadmin-skills emphasizing hands-on skills and howtheysre providing a comprehensive overview of SRE practices across various organizations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

How they SRE

Introduction

How They SRE How They SRE is a curated knowledge repository of Site Reliability Engineering (SRE) best practices, tools, techniques, and culture adopted by leading technology or tech-savvy organizations.

Numerous organizations frequently share their insights and expertise, encompassing best practices, tools, and techniques that shape their engineering culture. They do this through various public platforms such as engineering blogs, conferences, and meetups. This repository compiles and presents content gathered from these sources.

Topics

Site Reliability Engineering
Hiring and Building SRE teams
SRE Culture
DevOps
Monitoring & Observability
Alerting
Incident Response & Post-Mortem
On-Call
Testing in Production
Chaos Engineering
Automation
Performance
Platform Engineering

Organizations

Achievers

Blog Posts

Airbnb

Blog Posts

Algolia

Blog Posts

Alibaba Cloud

Blog Posts

Asana

Blog Posts

ASOS

Blog Posts

Atlassian

Blog Posts

BackMarket

Blog Posts

How Back Market SREs prepared for Black Friday

Baidu

Videos

Basecamp

Blog Posts

Books

Shape Up

Bloomberg

Videos

Booking.com

Blog Posts

Videos

Capital One

Blog Posts

Major incidents & analysis reports

Videos

Coinbase

Blog Posts

Open Sourcing Coinbaseâs Secure Deployment Pipeline

DAZN

Blog Posts

Site Reliability at DAZN

DBS

Blog Posts

Videos

SREcon Conversations Asia/Pacific with Koon Seng Lim, DBS

DeepSource

Blog Posts

Dream11

Blog Posts

Dropbox

Blog Posts

Videos

Service Discovery Challenges at Scale

eBay

Blog Posts

Video

Madaari: Ordering for the Monkeys

Epic Games

Video

AWS re:Invent 2018: Epic Games Uses AWS to Deliver Fortnite to 200 Million Players

Etsy

Blog Posts

Videos

Expedia

Blog Posts

Fastly

Videos

G-Research

Blog Posts

Getaround

Blog Posts

GitHub

Blog Posts

Major incidents & analysis reports

Videos

One on One SRE

GitLab

Blog Posts

GoCardless

Blog Posts

Major incidents & analysis reports

GoDaddy

Blog Posts

Gojek

Blog Posts

Goldman Sachs

Blog Posts

Videos

Granular CPU Capacity Management at Scale with eBPF

Google

Blog Posts

Videos

Grab

Blog Posts

Grammarly

Blog Posts

Gusto

Blog Posts

Halodoc

Blog Posts

Site Reliability Engineering for Native mobile apps

Heroku

Blog Posts

IBM

Blog Posts

Indeed

Blog Posts

Videos

Are We Getting Better Yet? Progress Toward Safer Operations

Indeed

Blog Posts

SRE Playbook - Practical Guide

Khan Academy

Blog Posts

Videos

Tools

On-Call

Loggi

Blog Posts

Loveholidays

Blog Posts

Macquarie

Blog Posts

Mattermost

Blog Posts

Meituan (ç¾å¢)

Blog Posts

The development and practice of SRE in the cloud (äºç«¯çSREåå±ä¸å®è·µ)

Mercari

Blog Posts

Videos

Microsoft

Videos

MIRO

Blog Posts

Monzo

Blog Posts

Videos

Eventually Consistent Service Discovery

Tools

Response

Netflix

Blog Posts

Major incidents & analysis reports

Post-mortem of October 22, 2012 AWS degradation

Videos

Podcasts

Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems

Tools

Dispatch

New Relic

Blog Posts

Nubank

Blog Posts

OpenAI

Blog Posts

PayPal

Blog Posts

Videos

Picnic

Blog Posts

Videos

Postman

Blog Posts

Learn how your Kubernetes clusters respond to failure using Gremlin and Grafana

Prezi

Blog Posts

Red Hat

Blog Posts

Videos

Noisy Neighbors, through Networking

Riot Games

Blog Posts

Videos

Riot Games: Evolution of Observability at the Gaming Company

Salesforce

Blog Posts

Schibsted Media

Blog Posts

Reliability engineering for some of top 10 sites in Scandinavia

Scribd

Blog Posts

Shopify

Blog Posts

Videos

Sky Betting and Gaming

Blog Posts

Slack

Blog Posts

Videos

Slalom Build

Blog Posts

Soundcloud

Blog Posts

Spotify

Blog Posts

Videos

Tracing, Fast and Slow: Digging into and Improving Your Web Service's Performance

Squarespace

Blog Posts

Under the Hood: Ensuring Site Reliability

Videos

Stack Overflow

Blog Posts

Videos

Low Context DevOps: Improving SRE Team Culture through Defaults, Documentation, and Discipline

Strava

Blog Posts

Stripe

Blog Posts

Videos

Target

Blog Posts

Teads

Blog Posts

Scaling your on-duty team

Tinder

Blog Posts

Tokopedia

Blog Posts

Trivago

Blog Posts

How To Get Fooled By Metrics

Twilio

Blog Posts

Twilio SRE Gameday Template

Twitter

Blog Posts

Uber

Blog Posts

Videos

Udemy

Blog Posts

Videos

upGrad

Blog Posts

VGW

Blog Posts

The SRE Incident Response game

Videos

Level Up Your Incident Response With Gameplay

Wikimedia Foundation

Videos

Wix

Blog Posts

Yelp

Blog Posts

The process: Implementing Yelpâs failover strategy

Videos

Yelp - What I Wish I Knew before Going On-Call

Zalando

Blog Posts

Vidoes

Zerodha

Blog Posts

Zomato

Blog Posts

Huddle Diaries â DevOps and Data Platform

SRECon Mix Playlist

Videos

Resources

ð Books

Events

Other Resources

Awesome Lists

SRE Resources from various organizations

Incidents & postmortems

Newsletters

Credits

Inspired by Howtheytest from Abhijeet Vaikar
The list of organizations is referred from my other repo awesome-engineering
Banner image Cartoon vector created by vectorjuice - www.freepik.com

Other How They... repos

Contributors

Contribute

Contributions welcome! Read the contribution guidelines first.

Stargazers Over Time

License

To the extent possible under law, Unmesh Gundecha has waived all copyright and related or neighboring rights to this work.

If you decide to use this anywhere, please credit @upgundecha on X. Also, if you like my work, check out my other projects on GitHub.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot