How to SRE-ify your React app with Prometheus

Reading Time: Approximately 3 minutes.

I am not a JavaScript developer. However, I was given a task at work recently that forced me to enter the abyss and get good at keeping my Promises. I was asked to create a webinar on helping developers become better SREs through observability and instrumentation. The objective was to take a broken web app and add enough monitoring and logging to it to make troubleshooting its brokenness easier. (I’ll update this post with a link when we broadcast it on April 22nd! … »

A few tips on successful remote value stream maps.

Reading Time: Approximately 6 minutes.

There is no doubt that the worldwide COVID-19 crisis has been a wet blanket for digital transformation across the enterprise. However, I don’t know about you, but I’m super fortunate that this is happening in 2020’s technological landscape instead of, say, 2010’s. With video conferencing solutions that work with even the slowest and least reliable internet connections and real-time collaboration tools that scale to hundreds of people per session, many of today’s key activities that required an office only five years ago can be done from the comfort of our own homes or apartments. … »

Story Points Aren't Units of Time

Reading Time: Approximately 4 minutes.

They just aren’t. WHY Search for “story points agile” on Google. Try it. You don’t even have to type it into Google; click the link! You’ll get, at this time of writing, approximately 12 million results. Accouting for the 8 million results are bots promoting something that requires your wallet, that leaves four million web pages, many of which will go on to describe story points to the letter and how they aren’t about estimation. … »

SRE and BDD: The Ultimate Power Pair

Reading Time: Approximately 7 minutes.

The responsibilities of a Reliability Engineer are well understood: maintain a high degree of service availability so that customers can have a consistently enjoyable and predictable experience. How these goals are accomplished — establishing SLOs with customers, enforcing them through monitoring SLIs and exercising the platform against failure through Game Days — is also well understood. Much of the literature that exists on SRE goes into great depths talking about these concepts, and for good reason: failing to establish a contract with the customer on availability expectations for the service that they are paying for is a great way for its engineers to spend their entire careers fire-fighting. … »

SRE Communities vs SRE Centers of Excellence

Reading Time: Approximately 7 minutes.

I read Google’s Site Reliability Engineering Workbook on a flight to New York the other day. I read their original book when it came out two years ago and was curious to see how much of it mirrored my own (brief) experience as a Google SRE. Given that it’s been a while since I did pure SRE work, I wanted to keep my skills caught up, and the Workbook seemed like a more accurate reference to follow. … »

Is your Java app ready for Docker? Take this super quick test!

Reading Time: Approximately 1 minutes.

Here’s a really quick test to see if your enterprise Java app is ready for Docker. NOTE: I am not a Java developer; more like a casual observer. Get your pitchforks ready! If I can’t do this: $> docker run --rm --volume "$PWD:/app" --volume "$HOME/.m2:/root/.m2" \ --tty maven:3.6.0-jdk$WHICHEVER_VERSION-alpine mvn build $> docker run --rm --volume "$PWD:/app" --tty openjdk:$WHICHEVER_VERSION-jdk-alpine \ java -jar /path/to/war.war Then either: Your application is not 12-factor and is probably not ready for Docker, Your source code has hidden dependencies that live outside of your pom. … »

Move Fast And Retain Corporate Governance with Pull Requests

Reading Time: Approximately 7 minutes.

DevOps and change control mix like oil and water. Product and development teams want to experiment with and release ideas as quickly as their customers request them, and do so with tight, but unstructured, collaboration across organizations. On the other hand, corporate governance wants auditability, transparent risk mitigation and justification in every step of the way. Consequently, both of these sides often don’t get along with each other well, hindering development speed in the progress. … »

Good Tools Are Important. Ignore At Your Own Peril

Reading Time: Approximately 7 minutes.

I’ve been consulting for some of the world’s largest companies for the last three years and have observed three themes that worry me: Agile is a really controversial word, despite the manifesto being quite clear on the matter, Somewhere within every company, there are many, many engineers that have been waiting weeks for test environments, and Engineers have the heaviest, plasticky-iest, and most unpleasant machines in the entire organization This (hopefully) brief post is about that third point. … »

How To Make Enterprise Container Strategies That Last, Part I

Reading Time: Approximately 11 minutes.

Intro I was in high school when I got introduced to this weird app called VMware Workstation. I thought the idea of using your Windows machine to run other machines was really compelling - a perfect fit of my younger and geekier self. You couldn’t pay me enough back then to believe that almost all[^1] of the world’s most important applications would eventually run on virtual machines…on someone else’s computers! I really liked the idea, but Workstation was a bit of a bear to use at the time and the virtual machines it created were quite slow. … »