First, huge shout out to the five of you that actually read stuff from my WordPress blog!
You might have noticed that, with the exception of a a few posts for my company, I’ve been mostly silent. I haven’t given up on writing; quite the contrary, in fact! I haven’t run out of neurons to fire either (darn!). I’ve stopped writing on my own site for one reason: I pledged to not write again until I was completely out of WordPress. Now that I am, I can write again!
Why do you hate WordPress so much, Carlos?
I’ve been using WordPress for writing blogs since I decided to start writing back in 2015. (I actually started using it in 2013, but I didn’t really commit to writing on a semi-regular basis until two years later.) It’s always been a love-hate relationship, mostly erring on the side of hate, for a few reasons:
The editor is clunky and takes forever to load over slow connections (this was a massive deal in the dark ages when American and Delta were using cellular-based internet in their planes and breaking past dial-up speeds was a huge surprise.
The code generated by their editors is all sorts of non-standard.
I enjoy cross-posting my posts onto LinkedIn Pulse. It was never a copy-paste operation, because of the non-standardness mentioned above. It often took about an hour to get my WordPress and LinkedIn blog in sync, and even then small things weren’t perfect.
These were annoying, but not dealbreakers. The deal breaker that finally pushed me over the edge was this: I had to pay WordPress $13/month to use a custom domain backed by HTTPS.
“That’s really not a lot, Carlos.” You’re right. But think about it:
I’m paying them $13/month for what is probably a five-second change on their webserver.
I’m paying them $13/month for the (possible) privilege of using a wildcard SSL certificate, so it’s not like I’m even telling readers that I own that blog.
I’m paying them $13/month because I was too lazy to roll my own solution, despite being perfectly capable of doing so.
This was what I wanted my blog to be:
- Powered by Git: Every post to be version-controlled in GitHub.
- Markdown everywhere: Every post is pure, high-grade, unadulterated Markdown.
- No ads: The blog would have no ads, anywhere, ever.
- Locally testable: The blog can be tested locally and will look and behave the same on the web.
- Easy to maintain: Adding anything to the blog, including SEO for discoverability, would be easy.
- CHEAP AF: The blog would be stupid cheap to run.
- REPRODUCEABLE AF: The blog would be stupid easy to reproduce anywhere, any time.
- CI/CD for everything: The blog will be deployed via CI whenever I post new content.
Essentially, I wanted a fast, super simple blog driven by CI and Git Flow.
I knew that Git-friendly blogs like Ghost and Jekyll existed, but I didn’t really know the landscape well. So I went looking and stumbled upon Hugo within about 32 seconds.
Hugo is a Golang-powered static website generator with a focus on blogs. While you can create just about any kind of website you put your heart into with it, it is really good at creating blogs. It’s basic premise is three-fold:
- Almost everything is comprised of a hiearchy of HTML files.
- Every post has “front matter” that describes it.
- Everything is configurable through a configuration file (
config.tomlin my case)
Websites powered by Hugo can be hosted by Hugo through a small Go web server or statically generated and copied onto the web server of your choice.
Perfect for a WordPress hater like me.
Serverless all the things!
So my natural inclination was “let’s do serverless!”
My thoughts were:
- I don’t get a ton of traffic, so why not use Lambda or Azure Functions to render the site on demand?
- I get to finally learn serverless and sound cool at talks.
But digging not too deeply into this hole proved that this didn’t make any
sense. While I did find a pretty awesome
on how to do this, it was way more complicated than I originally intended.
Additionally, Hugo generates static websites. Just-in-time rendering didn’t
make sense when a simple local render and S3 sync would do. While serverless
will probably be much more useful for automated chores (like purging old
index.html files since they are versioned; more later), using it for core blog
workloads didn’t make sense.
How about S3?
Using AWS S3 and S3 Static Web Hosting made much more sense. Using this would enable me to build the blog locally, test it with Hugo and sync the files up to a designated bucket. The only negatives with this approach are:
Websites hosted out of S3 do not support custom SSL certificates. The only way to work around this is by using a CloudFront distribution and making the S3 bucket an origin. While this does satisfy the “make it fast everywhere” goal, it does complicate my infrastructure a slight bit, and it slightly increases cost.
It isn’t possible to do “clean” integration tests, since the Hugo web server makes different assumptions than S3’s web server. Now that I think about it, though, I’m not sure if clean integration tests were ever possible since we can’t locally bring up a S3 web server.
Automating all of this!
Terraform, Docker and Make, of course!
Terraform all the things!
I could have used AWS CloudFormation to provision all of this, but I wanted to use something that would make it easy for me to host blogs on multiple clouds. While I’m using AWS for this now, I intend on getting an Azure certification this year, and hosting this blog on it will be an easy way for me to study. I’ve also been using Terraform for many, many years, and absolutely love it. There is no easier way of provisioning infrastructure (though Pulumi is looking very interesting), and it supports just about every cloud provider out there.
Docker all the things, too!
I could have created provisioning scripts that install Hugo and other tools required to provision my blog, but why do that when I can use Docker instead? Containers make it really easy to deploy and run applications in consistent environments anywhere. Docker Compose makes it easy to link multiple containers with each other and run containerized tasks repeatedly. Combining the two made provisioning super easy!
Docker in Docker, though.
The biggest challenge with this was dealing with the Docker in Docker problem. While nested containers aren’t bad per se, accessing networked services, like my local Hugo server, or mounting volumes on them basically required host-mode networking and passing paths on the host downstream. This can introduce security vulnerabilities, as it allows containers (with root access) to bind processes onto real ports on the host machine.
Additionally, starting Docker Compose or Docker from within nested containers requires that the
docker-compose binaries be present in the container’s
I develop on a Mac, and OS X doesn’t allow you to simply mount these binaries on
your computer since the Docker for Mac daemon treats the directories they live
in as “special” directories that can’t be volume-mounted. To work around this, I
have to install Docker and Docker Compose on some containers when they start up.
Not ideal, but it works.
Make all the things, too!
Choosing a build runner for this code was slightly more complicated. I generally err to writing Bash scripts in the first instance, as I know Bash well (despite it being a poor language compared to other scripting languages like Python or Ruby), I can count on it being available on just about anything and linting/testing it is easy to do with ShellCheck and BATS, respectively.
However, I like using Make since it is slightly more portable and it is
somewhat friendlier for actually building stuff with that doesn’t product
explicit build artifacts. I’m guilty of writing really, really
Makefiles though, so I wanted to be really careful about making my Make
structure easily readable and approachable this time.
Committing sensitive cloud provider API keys would be pretty bad. However, I didn’t want to set up a Vault cluster and worry about that “chicken-and-egg” problem. So I resorted to a simpler approach: environment dotfiles in AWS S3 and an example dotfile committed to Git.
While dotfiles have the “but you can store them insecurely!” problem, they are easy to reason about and easy to import into Make and other tools.
I decided on using Travis CI for CI/CD. I’ve been using
it for many years and have generally been happy with it. I like being able to
simply drop a
.travis.yml into my repository describing my build, test and
deploy stages. It is way easier to use and way harder to abuse than
Jenkinsfiles that I’ve seen and written in the past for Jenkins. Their CLI
tool is also great, albeit buggy, and allows me to monitor and invoke builds
without having to use the website.
The final product!
After many weeks of iteration and failure, bloggen was born! Here’s how it works! It’s not the friendliest process, but it’s a start.
Create a Git repository containing a Hugo blog with the standard Hugo directory hierarchy and files.
config.toml.tmplfile. This is a gomplate-templated version of Hugo’s
config.toml. This allows for more flexibility, such as using environment variables to define site properties.
blog-gento your blog’s Git repository. Ensure that you add the directory to your
.gitignoreto avoid committing a reference to it. (Git will not commit sub-repositories by default unless they are added as sub-modules).
blog-genrepository to your blog’s repository. Fill in the dummy values and save as
Create a Docker Compose file that looks something like this inside of your blog’s repository.
.travis.ymlthat looks something like this inside of your blog’s repository. Make sure that you activate the project within Travis first and define these environment variables:
DOTENV_S3_BUCKET(the S3 bucket containing your “production”
docker-compose up start-local-blogto start a local instance of your blog. Visit
http://localhost:8080to see it in your browser. (The port can be changed from within your
If you like how it looks, commit and push your changes. Travis should build the build, create the S3 bucket and CloudFront distribution and deploy your blog to the S3 bucket. Provisioning it for the first time takes about 20 minutes because CloudFront is slow.
Underneath the hood
There are a few moving parts within
blog-gen. Here’s how they work:
It looks for a
.envwith environment variables describing the Hugo website to be provisionined, the provider the blog is being deployed to and the location to store sensitive data such as Terraform state and passwords.
.envisn’t present locally, it tries to fetch it from cloud storage (S3 at the moment) defined by an
bloggenexits if this fails.
.envfile contains an
INFRASTRUCTURE_PROVIDERenvironment variable. Terraform code within
blog-genis broken up by cloud provider and stored within the
infrastructuredirectory. This environment variable selects the directory to use within this directory.
make deploy_infrastructureinitializes the Terraform code found above, generates a Terraform plan and
applys it. Terraform is executed in a Docker container, and the version of the image selected is defined within an environment variable. The intermediate Make steps that do all of this work are defined in Makefile includes within the
make deploy_blogruns next. This uses Docker Compose via intermediate Make build steps within
include/compose/hugoto build the Hugo blog and deploy the files generated by it to S3 (or a different cloud provider per
Hugo generates an
index.html file with a summary of blog posts.
Because it can take some time for CloudFront and local browsers to pick up
changes to this file due to the nature of caching, we have to invalidate this
file whenever we make changes to it. I could have used CloudFront Cache
Invalidations to do this, but I only get 1000 of those per month and it can
take time for the invalidation to complete.
A much simpler and more instant solution is versioning
appending commit SHA’s to the end of the file. When clients try to request the
file by going to https://blog.carlosnunez.me, the distribution won’t have the
file immediately available and will fetch the file from S3 directly. This
introduces some latency for fresh new posts, but that latency goes away after
the distribution has been updated (about 20 minutes).
- Travis checks for liveliness after the blog has been deployed by trying to
fetch it with
curlfor a given amount of time (three minutes by default).
Due to the pipeline above, this blog post is:
- Written entirely in Markdown,
- Vetted locally with a local Hugo server,
- Stored in Git,
- Deployed and tested via CI, and
- Globally and securely available via CloudFront with my own HTTPS certificate.
On top of all of that, it takes about two minutes to release new blog posts, and new content is immediately available everywhere after that.
I’m pretty happy with this dogfood!
blog-gen is in really early days. I don’t have a lot of confidence in it
working for anyone but me right now. (I do want people to break it, though!)
There are a lot of things I want to improve on!
I originally wanted to use blue-green deployment so that I could always have a
“beta” version of my blog available. This would have been a good place to test
reactions to content without it getting indexed on Google, for example. I didn’t
know how to go about it when I was developing
blog-gen, so I decided to punt
on it and do rolling releases.
Rolling back is manual at the moment, which I don’t like. Ideally, I would run a job in Travis that:
- Gets the SHA for my last commit, and
make deploy_infrastructureusing that commit SHA (since it handles generating the correct
error.htmlfiles and updating the S3 bucket in kind).
But that’s a manual job right now, and that makes me sad.
As you’d imagine, I have a ton of
index.html files lying around. I don’t need
them all, and regenerating them is really easy (as descirbed above). It would
be nice to purge them every so often.
Serverless wasn’t perfect for my blog, but it’s perfect for this! I would love a few Golang-written Lambda functions that take care of this!
While I don’t plan on doing this right now, I would eventually like to add restricted content. Reviewing client-sensitive content for Contino without releasing client-confidential information would be a good example of this.
Lambda@Edge is a great tool for this purpose. It allows CloudFront maintainers to run Lambda functions against the distribution instead of at the S3 bucket level. This can help when dynamically re-writing addresses or doing more complex routing than what CloudFront offers out of the box.
I would like to play with this.
I currently only support AWS. I want to support Azure and GCP, largely so that I can finally smack talk them with authority. I’ve designed my blog to be multi-cloud; I just need to implement it!
Guy gets tired of WordPress and builds his own thing with Hugo. It does the DevOps. Guy is happy.