SRE and BDD: The Ultimate Power Pair

Reading Time: Approximately 7 minutes.

The responsibilities of a Reliability Engineer are well understood: maintain a high degree of service availability so that customers can have a consistently enjoyable and predictable experience. How these goals are accomplished — establishing SLOs with customers, enforcing them through monitoring SLIs and exercising the platform against failure through Game Days — is also well understood. Much of the literature that exists on SRE goes into great depths talking about these concepts, and for good reason: failing to establish a contract with the customer on availability expectations for the service that they are paying for is a great way for its engineers to spend their entire careers fire-fighting. … »

SRE Communities vs SRE Centers of Excellence

Reading Time: Approximately 7 minutes.

I read Google’s Site Reliability Engineering Workbook on a flight to New York the other day. I read their original book when it came out two years ago and was curious to see how much of it mirrored my own (brief) experience as a Google SRE. Given that it’s been a while since I did pure SRE work, I wanted to keep my skills caught up, and the Workbook seemed like a more accurate reference to follow. … »