Mastering Heroku – Issue #2

I want to talk about a problem we ran into last week at work.

We use pipelines for deployment—our master branch is automatically deployed to staging via CI, then we promote staging to production. This works amazingly well, and is one of the features I love most about Heroku. But this week, one of our promotions from staging to production failed. Specifically, the release command failed.

Don’t worry if you’re not a rubyist. Here’s what’s happening:

  • The release phase is trying to migrate the database via `rails db:migrate` (as specified in our Procfile).

  • It’s encountering an error while booting the app environment, before it even touches the database.

  • The error makes no sense. It’s failing to parse JSON that’s internal to Ugliflier, a Ruby wrapper for UglifyJS.

This kind of thing isn’t supposed to happen with pipelines. The beauty of pipelines is that the compiled slug is reused, so the same application code and dependencies are guaranteed between environments.

But it was happening, and we needed to fix it.

Debugging 101: Reproduce the problem

Before we could fix the problem, we needed a reliable way to reproduce it. Fortunately, repeated promotions did reproduce the exact same error. Unfortunately, we couldn’t step into that process to introspect what was going on.

We were hopeful that we could manually reproduce the problem by running `rails db:migrate` in a Heroku Bash session, but that ran without issue on both staging and production. Of course, this wasn’t a true reproduction anyway because our Bash session on production was using the old code—we didn’t have a way of getting the new code on production and manually reproducing the error.

So we were stumped again.

What are the differences?

We knew that staging worked with the exact same code, so the next question is: what’s different between the two environments? Pipelines ensure parity in the code, which leaves two potential differences:

The two environments use different databases, but the app hadn’t even attempted to connect to it yet. We were pretty sure this error was encountered before any external dependency came into play, so we focused on config var differences. One stood out to me in particular:

Jemalloc is an alternative memory allocator for Ruby, but that’s not really important. What is important is that I’d enabled it in production, but not staging. An oversight on my part.

The stack trace showed a JSON parse error, though—nothing remotely to do with memory. This couldn’t be a relevant difference at all. Right?

You know how this story ends. We disabled Jemalloc in production, promoted staging, and deployed production without a problem.

Takeaways

This isn’t exactly a happy ending to the story. We still have absolutely NO IDEA why we encountered this error. It makes no sense at all. We’ve left Jemalloc disabled for now and are moving on.

There are still some good lessons to take away from the experience:

  • Parity between environments is important.

  • Know your tooling. The Heroku CLI was especially useful here for running Bash and exploring config vars.

Elsewhere…

Speaking of environment parity, this mini-debate on Twitter caught my attention:

Michał Matyas@nerdblogpl

I know creating custom environment in Rails (like staging) is a bad practice, but why exactly? Asking for the next time I need to explain it to someone with arguing style of "I disagree strongly not because I'm right but because you don't have good arguments"

August 21, 2018
So there are two approaches for staging/review environments on Heroku:

  1. Create a separate custom environment in your application. In Rails, this means a new file in config/environments and changing RAILS_ENV for each app instance.

  2. Treat all Heroku app instances as “production”, and use config vars to differentiate behavior and credentials by environment. This is Heroku’s recommendation.

I’m with Heroku on this one because inevitably someone (not you or me of course) will add an explicit “production” environment check to the code:

Will that code execute in your staging environment? Who knows! Don’t do this. Use config vars instead. 👍

I also don’t like custom environments because it should be painful when your environments diverge. Custom environments make it a too easy for staging to become way different from production.

Anyway, that’s enough ranting from me. 😀 What do you think about all this?

Have a great week!
—Adam