Fixing a critical issue: a journey into Ruby web server startup sequences, part two

So where are we jumping into this story?

This post is part two of our story on how we dove into Ruby web server startup sequences to fix a strange but critical issue that a few of our customers were having. Part one is here. Read that first!

In part one, we set the context for the issue, explored what was happening, and dove into Rack’s design (and its role in the issue). At the end of the post, we shared our fix for the issue, which essentially was to implement true access to the final configuration of the running Ruby web server and use of the dedicated web server hooks to delegate actual boot to workers. This worked out great in our testing environment, but we hit an unexpected stumbling block as soon as we ran our full integration test suite on CI. That’s where we rejoin the story here in part two. Without further ado…

Wherein the plot thickens

Everything was working on our integration test app, across all possible versions of Puma, Unicorn, Passenger, Ruby, and Rails. So what was the problem then? Why did we smack into a wall? As it turns out, our integration test app’s Gemfile orders things differently than what we always use, as we do hallway testing across varying scenarios and usually follow our own setup instructions for the agent. Those instructions give a simple shell command to append to the Gemfile, so it’s usually last…

By putting gem 'sqreen' before puma and starting with rails server, the gem is indeed required, but nothing happens, as there’s no web server to detect.

Out in the wild, we really can’t be sure of the final order in the Gemfile: we may have customers with that order already, or people may add web servers later on, and anyway we really want to be as easy to use as possible. So we won’t settle for imposing the order.

Thinking it through, there may be multiple of those web servers required in the Gemfile, but only one actively running, so we can’t rely on either constant presence or Gem.loaded_specs.

Also, there are multiple ways to start web servers: rails server, rackup -s, or the web server command itself: puma, unicorn, passenger… In essence, this fix was not enough.

Gloves on. Investigation intensifies.

Investigating the Rails server startup sequence

The issue started to become very apparent on Rails and using Puma, so let’s work from that.

By setting a binding.pry in the object that accesses the web server configuration to evaluate forking? and preload? we could immediately see the issue:

The ::Puma::Configuration constant is not even there!

While Rails guides describes the typical startup sequence, it’s short on details and focuses on upper level concepts, geared towards application development. What we needed was a detailed timeline of what happens behind the scenes, including how the various gem parts are required, how the web server to run is selected, and when the web server configuration is finally available.

Let’s build a map of what happens (I removed the steps that don’t matter). The first part is mostly about finding our place around:

I drew a line here, as we’re leaving the early command part. Starting with this, we’ll add some information about whether the code resides in Rails or Rack because there’s a lot of back and forth, and indent based on call depth.

At this point, the web server will start and this call will be blocking until the end of the web server life. We picked Puma, so we’ll add some additional quick notes here:

  • The Puma gem gets required during Bundler.require, but only minimally as we saw before. This situation lasts all the way to require config/environment.
  • Midway through require config/environment, some more of Puma’s parts are loaded, but still not everything, and still no configuration, which is why we fail to be able to make a decision at #to_app.

So let’s proceed where we left off:

Puma requires its own files very lazily, so we can’t hook Puma::Launcher#run directly because it’s only available very late.

I was expecting options to be filled with a default value, but instead it’s the Rack::Handler.default method which makes a fallback selection. To know what the web server used will be, I can hook on the Rack::Server#server accessor and get its return value.

Investigating the rackup command startup sequence

Compared to Rails, Rack’s startup is delightfully simple. A notable difference though is that Rack does not concern itself with Bundler so we’ll include that.

Again, let’s draw a line here as we’re leaving the early command part. Here’s the rackup part:

Those mere three lines are actually Rack, and the last line behaves like the Rails::Server case above, only without the (Rails) parts. This has subtle but rippling consequences on the order of some things though: see how wrapped_app is touched at a slightly different time, and Bundler.require is explicitly called in the Rails sequence, whereas it will be done as part of require config/environment step in Rack.

Investigating the web server commands startup sequences

As it turns out, each Ruby web server also has a command to start the server by itself. All of them leverage Rack in some way and have similarly simple startup sequences, but the timeline is again important. These commands should be started with bundle exec so I’ll generally skip that part as we get into it below.

Puma

As mentioned before, Puma has both a cluster mode – which forks workers – and a single mode – which relies on threads for concurrency.

Puma doesn’t use Rack::Builder‘s build_app_and_options_from_config, instead reimplementing a part of it to digress between Rack::Builder or Puma::Rack::Builder. The boot notably differs because Puma::Launcher#run is called very early so it cannot be hooked upon, since the app will only be required – and thus Bundler.require – at the new_from_string stage. Also, since Puma::Launcher requires the configuration, we have everything needed super early.

Puma also supports graceful restarts for updates, for which it will fork and exec to a new, clean Ruby process, and for that it will need to setup Bundler again. This is done in Puma::Launcher#run by manipulating the RUBYOPT environment variable.

Unicorn and Rainbows

Rainbows extends Unicorn, so their starting process is basically the same. It also embeds no Rack::Handler by itself, so it’s usually started by its unicorn command. There is also a unicorn_rails command but it’s mostly inconsequential regarding startup with modern Rails versions.

Unicorn can only fork, by design. Its startup is superbly small.

So basically Unicorn reimplements Rack::Builder#parse_file, but otherwise nothing to see here, please move along.

Passenger

Passenger has a particular behavior: the passenger start command is merely a remote control for a background process started completely independently, and runs tail -f on the log files to fake the output. A consequence of that is that you can’t use binding.pry since the Ruby part is actually not a TTY, making interactive exploration a little bit more challenging.

This background process will start the application in all cases. There are two operation modes, called spawning methods: direct (i.e forking without preloading) and smart (i.e forking with preloading). This is how Passenger enables scaling, dynamically ramping processes up and down.

To this end, the Passenger background process will exec a detached Nginx as well as a Passenger Agent and a Passenger Watchdog. This design allows Passenger to present a single frontend port for all applications, independently of the language since Passenger supports more than just Ruby. The Passenger Agent then forks into a Passenger Core process, itself forking into the various applications.

At this stage, the spawning method matters and will produce either an AppLoader process in direct mode — which will load the app upon spawn, then run the app — or an AppPreloader one in smart mode — which will load the app, then fork and run the app.

AppLoader is implemented in rack-loader.rb:

AppPreloader is implemented in rack-preloader.rb:

In both cases, Passenger basically inlines the implementation of Rack::Builder.parse_file. Also, since the background process is completely independent of the remote control one, Passenger has to call Bundler’s setup, which it does through run_load_path_setup_code.

Another peculiar point is that Passenger definitely monkey-patches Rack::Handler.default:

This comes into play when starting with rackup or rails server. This Rack::Handler has only one role: calling system('passenger start'). The consequence is that there is no ultimate difference between the three commands to start Passenger.

A cheerful conclusion

Armed with this detailed behavior, we were able to solve the issue at its root for all mentioned web servers, and more, including Thin and WEBrick. By hooking onto Rack::Server#server accessor, we were able to reliably detect the web server to be run in a simple, generic way, and by hooking this web server’s Rack::Handler::<server>#run method, we were able to make special case decisions according to each one’s specific implementation and configuration in a simple, specific way for that web server. The detailed timeline of events guaranteed that those hooks and the Rack::Builder#to_app one will operate properly in every startup situation. We also improved our integration tests to cover the whole test matrix of web servers, configurations, and startup commands. We therefore properly patched our customer’s issues this time around, and hardened our Ruby agent against further startup issues down the line. And everyone lived happily ever after.

The takeaway of this wondrous adventure into Ruby web servers is that even though Rack has a very simple design, the ecosystem around it can be surprisingly involved! The changes across versions of Rack, Rails, and each web server are quite limited, but the variety of implementations is a testament to the power of Rack’s design and the ingenuity of the community.

We hope you enjoyed this look at the journey we took into Ruby web server startup sequences in order to solve a critical customer issue. If this sounds like the kind of thing you want to work on and be a part of, we’re hiring!

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
0 Comment authors
Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
trackback

[…] Unfortunately, despite everything looking great in our testing, this is not the end of the story. We have a lot more to dive into with Ruby web server startup sequences. Since it’s a long tale, we decided to split up the telling of it into two parts. Part two of this story continues here. […]

You May Also Like