How to SRE-ify your React app with Prometheus

Reading Time: Approximately 3 minutes.

I am not a JavaScript developer. However, I was given a task at work recently that forced me to enter the abyss and get good at keeping my Promises. I was asked to create a webinar on helping developers become better SREs through observability and instrumentation. The objective was to take a broken web app and add enough monitoring and logging to it to make troubleshooting its brokenness easier. (I’ll update this post with a link when we broadcast it on April 22nd! … »

A few tips on successful remote value stream maps.

Reading Time: Approximately 6 minutes.

There is no doubt that the worldwide COVID-19 crisis has been a wet blanket for digital transformation across the enterprise. However, I don’t know about you, but I’m super fortunate that this is happening in 2020’s technological landscape instead of, say, 2010’s. With video conferencing solutions that work with even the slowest and least reliable internet connections and real-time collaboration tools that scale to hundreds of people per session, many of today’s key activities that required an office only five years ago can be done from the comfort of our own homes or apartments. … »

Want to test Ansible playbooks that require systemd in Docker? Try this.

Reading Time: Approximately 2 minutes.

Kubernetes and other cloud-native strategies might be putting configuration management out to pasture, but I found myself writing a playbook recently while learning how to create infrastructure as code for Azure. I needed to start my Flask web server and Postgres database with systemd, which isn’t a pattern that’s easily supported by Docker. I got this working with Docker Compose, however, and this post will show you how! Create a Docker Compose file with the following services: version: '2. … »

SRE and BDD: The Ultimate Power Pair

Reading Time: Approximately 7 minutes.

The responsibilities of a Reliability Engineer are well understood: maintain a high degree of service availability so that customers can have a consistently enjoyable and predictable experience. How these goals are accomplished — establishing SLOs with customers, enforcing them through monitoring SLIs and exercising the platform against failure through Game Days — is also well understood. Much of the literature that exists on SRE goes into great depths talking about these concepts, and for good reason: failing to establish a contract with the customer on availability expectations for the service that they are paying for is a great way for its engineers to spend their entire careers fire-fighting. … »

SRE Communities vs SRE Centers of Excellence

Reading Time: Approximately 7 minutes.

I read Google’s Site Reliability Engineering Workbook on a flight to New York the other day. I read their original book when it came out two years ago and was curious to see how much of it mirrored my own (brief) experience as a Google SRE. Given that it’s been a while since I did pure SRE work, I wanted to keep my skills caught up, and the Workbook seemed like a more accurate reference to follow. … »

Move Fast And Retain Corporate Governance with Pull Requests

Reading Time: Approximately 7 minutes.

DevOps and change control mix like oil and water. Product and development teams want to experiment with and release ideas as quickly as their customers request them, and do so with tight, but unstructured, collaboration across organizations. On the other hand, corporate governance wants auditability, transparent risk mitigation and justification in every step of the way. Consequently, both of these sides often don’t get along with each other well, hindering development speed in the progress. … »

Getting Into DevOps.

Reading Time: Approximately 13 minutes.

I’ve observed a sharp uptick of developers and systems administrators interested in “getting into DevOps” within the last year or so. This pattern makes sense, too: in an age where a single developer can spin up a globally-distributed infrastructure for an application with a few dollars and a few API calls, the gap between development and systems administration is closer than ever. While I’ve seen plenty of blog posts and articles about cool DevOps tools and thoughts to think about, I’ve seen fewer content on pointers and suggestions for people looking to get into this work.

My goal with this article is to, hopefully, draw what that path looks like. My thoughts are based upon several interviews, chats, late-night discussions on reddit.com/r/devops and random conversation, likely over beer and delicious food. I’m also interested in hearing feedback from those that have made the jump; if you have, please email me. I’d love to hear your thoughts and stories.

»