Site Reliability Engineering
This section contains site reliability expert related notes.
📄️ What is Site Reliability Engineer?
Definition
📄️ Chaos Engineering
Chaos engineering is the practice of intentionally introducing controlled disruptions or failures into a system to test its resilience and reliability. The goal is to identify vulnerabilities, understand system behavior under stress, and build confidence in its ability to withstand unexpected conditions.
📄️ Distributed Tracing
Distributed tracing is a technique used to track requests as they flow through various services in a microservices architecture or a distributed system. It helps provide visibility into how requests are processed, how services interact, and where bottlenecks or failures may occur.
📄️ Kubernetes (k8s)
Kubernetes (often abbreviated as K8s) is an open-source platform for automating the deployment, scaling, and management of containerized applications. It provides a robust framework for running distributed systems reliably and efficiently.
📄️ SLA, SLO, and SLI Metrics
Understanding SLA, SLO, and SLI