Mastering Heroku — Issue #5

Oct 09, 2018

A reader asked me for some help this week:

Hi Adam, saw your post and decided to reach out. I used AutoScale in the past but am now on performance dyno. We keep running into Rack::Timeout::RequestTimeoutException errors. Wondering if you may have any suggestion.

I feel this pain. Request timeouts are the worst. If you're not a Rubyist or if you're just unfamiliar with the error above, Rack::Timeout is a library for timing out requests on Rails.

Why would you want to timeout a request? Because if you don't, Heroku will:

Occasionally a web request may hang or take an excessive amount of time to process by your application. When this happens the router will terminate the request if it takes longer than 30 seconds to complete.

We’ve all seen these infamous H12 errors appear in our logs and in our Heroku metrics panel.

It's unfortunate when a user sees an error page, but what's worse is that your app has no idea when Heroku times out a request. Your app will continue processing to completion, whether it takes an additional five seconds or an additional five minutes.

While the router has returned a response to the client, your application will not know that the request it is processing has reached a time-out, and your application will continue to work on the request.

Libraries like Rack::Timeout allow you to halt processing of a long-running request before Heroku times out. This gives you more control over the error the user sees and prevents a hung request from bringing down your app server.

It's a Band-Aid, though, and this particular Band-Aid often introduces more problems than it solves.

Raising mid-flight in stateful applications is inherently unsafe. A request can be aborted at any moment in the code flow, and the application can be left in an inconsistent state.

This is straight from the Rack::Timeout docs, which do an excellent job of warning you about the risks and tradeoffs. I’ve personally seen all kinds of strange and frustrating behavior with Rack::Timeout. As far as I’m concerned, it’s just not worth it.

So what’s a safe alternative that prevents hung requests from bringing down your app? As usual, Nate Berkopec sums it up well:

Nate Berkopec @nateberkopec

Most apps could eliminate usage of rack-timeout if they just set aggressive timeouts on network and db. e.g. setting statement timeouts: https://t.co/Ji9delcuB8 I understand some apps can't for legitimate reasons, but rack-timeout is a big, big hammer.

Nate references the Ultimate Guide to Ruby Timeouts, which if you’re a Rubyist, you should bookmark right now. By setting library-specific timeouts on database queries and network requests, you can gracefully handle unpreventable slowdowns—roll back transactions, show a meaningful error message, whatever you need to do.

This isn’t a perfect solution, of course. You could still have a single request with 1,000 database queries, none of which individually time out, but collectively are way over Heroku’s 30-second limit.

In these cases, I still don’t think it’s worth reaching for a “big, big hammer”. Instead, set up alerting for Heroku’s H12 errors. You can use Heroku’s threshold alerting for this, or set up alerting in your log management tool (I use both). Heroku add-ons like Logentries will alert you on H12 errors out of the box.

With these alerts in place, you can investigate your timeouts to fix the root cause instead of relying on a Band-Aid. The H12 error will tell you the exact URL that timed out, so use that along with an APM tool like Scout to determine what went wrong. Chances are, you either have an N+1 query or you’ve omitted a timeout on some I/O.

To recap:

Set library-specific timeouts for all I/O (database, network, etc.)
Avoid solutions that arbitrarily halt application processing in the middle of a request. It’ll lead to unpredictable and hard-to-debug behavior.
Monitor your H12 errors.
Use an APM tool to fix those slow endpoints.

And of course, use autoscaling to ensure a few slow requests don’t slow down your entire app. 😁

Happy scaling!
— Adam

Mastering Heroku

Mastering Heroku — Issue #5

Discussion about this post