Optimizing the Web App Scaling Process with Ruby on Rails

13 min read

This is the opening review of the article series dedicated to web app scaling on Ruby on Rails.
JetRuby Agency was named the top Ruby on Rails developer by Clutch, and we are glad to share our insights on applying this famous framework.
As a popular web development framework, Ruby on Rails is appreciated for its practicability. It is often a choice of startups when there is not much thought about the future structure of the web app. We’ll address the Rails app’s scalability challenges.

Can I scale up my web app using Ruby on Rails?

The short answer is yes. You will be able to expand the functionality of your web application by leveraging the right approach through the Ruby on Rails framework. We’ll make sure to cover this approach in our article.
Over the past decade, Ruby on Rails has been regarded as the dominant all-in-one web framework, reigning supreme in the industry. However, in recent times, it has faced stiff competition from newer, faster, and more lightweight alternatives. This has sparked debates about whether Rails is becoming outdated and losing its ability to compete effectively.
Some would argue that Ruby on Rails is not an appropriate framework for modern web projects. We dissected this view in the article “Ruby on Rails is Dead” and explained why Ruby isn’t going away anytime soon. Ruby on Rails perfectly fits MVPs as it is a proven technology that boasts community support, allowing you to work with a transparent development budget and scalability.
The choice between different web frameworks and languages should consider factors beyond mere speed. Other aspects, such as ease of development, maintainability, scalability, and a vibrant community, also play crucial roles in the success of a web application. While performance is important, it is just one piece of the larger puzzle that web developers must consider when selecting the most suitable tools for their projects.
Our article considers the scaling aspects of web applications in the first place. CRUD (Create, Read, Update, Delete) operations often work for web applications—more than 80% of them match the description of this category with core data interactions. While faster languages and frameworks improve request processing times and optimize rendering, they don’t dramatically improve functionality improvement.
Ruby on Rails apps may expose app server constraints, memory management issues, and restrictions of the System Architecture while you’re following your way to handle an increased volume of customers and user requests.
When your startup project grows and begins to handle massive user requests per minute, you may think about its performance optimization in advance. We’ve scrutinized the information and tried our multiple approaches to a Rails-based app’s optimization. Whether you run a startup or a mature business, you can substantiate your plans with our guidelines. The effectiveness of Ruby on Rails assists both startups and mature companies, but a few critical prerequisites must be considered.
We believe (and this is proven by many years of experience in this industry) that a scaling challenge, in this case, should be addressed with a solid understanding of Ruby on Rails processes.

Why do you need scaling, exactly?

The answer to this question is more critical and less self-evident than it may seem. We’ve observed numerous Rails applications, some unnecessarily over-scaled, leading to wasteful expenses.
The plethora of services offered by AWS simplifies the scaling process. Still, they also make it tempting to scale up even when it’s not truly necessary. A common misconception among Rails developers is that by increasing instance sizes, they can automatically make their applications faster. However, the reality is quite different.
Scaling will not inherently improve the application’s speed unless the app frequently experiences queued requests and significant waiting times. Even PX dynos, while they can enhance performance consistency, won’t necessarily make the application run faster. On the other hand, altering instance types on AWS, such as moving from T2 to M4, may change app instances’ performance characteristics.
When faced with performance issues, some Rails developers tend to reflexively upgrade instance sizes reflexively, hoping spending more money will resolve the problem. Unfortunately, this approach does not yield the desired results in most cases, and their website continues to suffer from sluggishness.
In summary, developers must be mindful of their application’s actual needs before scaling, as indiscriminate scaling can lead to unnecessary expenses without addressing the root cause of performance issues. A thorough analysis of the application’s performance bottlenecks and a focused optimization strategy will be far more effective in achieving the desired speed improvements.

 The role of routing requests in Ruby scaling

Now that we have come to terms regarding the proper purpose of scaling let’s discuss what it takes to create prerequisites for this process.
Scaling primarily increases throughput, not the speed of individual requests. The benefits of scaling become evident when requests are waiting to be served by your application. If there is no backlog of requests, scaling may lead to unnecessary expenses.
A comprehensive understanding of your application server and HTTP routing is essential to scale Ruby apps from handling 1 to 1000 requests per minute.
In this example, we’ll use Heroku as a reference, but various custom DevOps setups operate similarly. Have you ever wondered about the “routing mesh” or the queuing process for requests before they reach your server? Well, you’re about to delve into those details.
An undisclosed quantity of Heroku routers exists, likely numbering in the hundreds or more. The principal function of these routers is to locate your application’s dynos and forward incoming requests to one of them. After investing approximately 1-5 milliseconds in the dyno-finding process, the router connects to a random dyno within your app, yes, entirely at random. This point led to some issues for RapGenius when Heroku’s clarification on the router’s selection process was somewhat ambiguous and confusing.
Once a random dyno is selected, Heroku allows up to five seconds to acknowledge the request and establish a connection. The request is placed in the router’s queue during this waiting period. It’s important to note that each router operates with its request queue, and given the lack of information from Heroku on the total number of routers, your application could be facing numerous router queues at any given time. Heroku employs request queue pruning when the line becomes too large, and it attempts to isolate unresponsive dynos by quarantining them. However, each router executes this quarantine action independently, meaning every router in Heroku handles individual dyno quarantines.
This mechanism resembles how most custom setups utilize nginx, as the DigitalOcean tutorial exemplifies. In such setup arrangements, nginx often functions as a load balancer and a reverse proxy. The same behavior can be replicated using custom nginx configurations, although more aggressive settings might be preferred. Nginx can actively send health-check requests to upstream application servers to verify their responsiveness. It’s important to mention that custom nginx setups typically lack their request queues.
For Heroku users, two critical details are worth noting: firstly, the router waits for up to 5 seconds to establish a successful connection to your dyno, and during this waiting period, other requests queue up in the router request queue.

web app scaling

 Choosing the web server

The 55 Custom router setup is closely associated with your choice of web server, which can be “Nginx”. Understanding the connection process with the server is crucial as it significantly varies based on your web server selection. The following process depends on the web server you’re choosing.

Unicorn spawns multiple “worker processes” (app instances), each listening on the same Unix socket, coordinated by a “master process.” A worker process accepts a request from the socket, waits for the download to complete, processes the request, and then listens on the socket again.

As the worker cannot accept new connections while downloading the request, Unicorn is vulnerable to slow clients since Each slow request ties up a worker, limiting the number of slow requests that can be served simultaneously. However, it can handle slow application responses because other available workers can still accept connections while one worker is busy.

In its default mode, Puma is a multi-threaded, single-process server. It employs a Reactor thread, similar to Thin, to handle downloading requests and can asynchronously wait for slow clients to send their entire request. Once the request is downloaded, a new thread communicates with the application code and processes the request. Puma can specify the maximum number of application threads running simultaneously.

Puma (clustered):
Puma’s clustered mode combines its multi-threaded model with Unicorn’s multi-process model. In this setup, Heroku’s routers connect to Puma’s “master process,” which handles downloading and buffering incoming requests before passing them on to available Puma workers. This combination allows Puma to deal effectively with slow requests and application responses.

A single-process, event-driven web server. Thin can accept other connections while waiting for a slow request to complete. It starts receiving parts of the request and can handle slow clients more efficiently.
However, it becomes unavailable while processing your application code. Unless you use EventMachine creatively, it cannot accept new requests while waiting for input/output operations in the application code to finish. Dealing with slow application responses requires significant custom coding.

Webrick (Rails default):
Webrick operates as a single-process and single-thread web server. It keeps the connection open with the router until the entire request is downloaded. The router then moves on to the subsequent request, and Webrick runs your application code to generate a response.   During this process, your host remains busy and cannot accept connections from other routers. If another router tries to connect while the request is being processed, it must wait (usually up to 5 seconds on Heroku) until the host is ready.
Webrick struggles with slow requests and uploads. This server is unsuited for handling slow client requests or application responses.

Phusion Passenger 5:
It encompasses a multi-process architecture that has a buffering system. The reverse proxy, which functions like Nginx, receives requests directly from the router, and downloads the entire request before forwarding it to the worker processes. This way, workers are protected against the slow process of uploading. The HelperAgent process routes requests to unused worker processes, allowing Passenger 5 to handle slow application responses and slow clients effectively.
Every web server has advantages and disadvantages, making it crucial to select the one that best aligns with your specific requirements and workload. Interestingly, when it comes to Ruby application servers, their primary differentiation lies not in their speed but in their distinctive I/O models and characteristics.
It’s not the speed that makes Ruby application servers different but their varying input/output models. In essence, for a Ruby web application to be scalable, it necessitates protection against slow clients, which can be achieved through request buffering, and defense against slow responses, which can be accomplished through some form of concurrency — either multithreading or multiprocess/forking (ideally both). Based on this criterion, the two most suitable scalable solutions for Ruby applications on Heroku running MRI/C Ruby are Puma in clustered mode and Phusion Passenger 5.
However, if you manage your setup, an alternative viable option is Unicorn with Nginx. By considering these factors and assessing your specific circumstances, you can decide which web server best fits your Ruby web application.
Deciphering the web server options for scaling Ruby applications
If you’ve followed along, you might have realized that a scalable Ruby web application requires slow client protection. The latter is done via request buffering. In parallel, multiprocess/forking or multithreading  (ideally both) tackles slow responses. Among the options discussed, Puma in clustered mode is a scalable solution for Ruby applications running on Heroku.
For those managing their setup, Unicorn with nginx becomes a viable alternative.
While these web servers boast varying claims about their “speed,” getting too caught up in minute differences may not be productive. They can handle thousands of requests per minute, generally taking less than 1ms to process each request. Thus, a mere 0.001ms difference between Puma and Unicorn might not significantly impact overall performance if the Rails application takes an average of 100ms to respond.
The critical distinction among Ruby application servers lies in their input/output characteristics rather than speed. As they have many other differences, conduct your feature research and determine the right fit for your web project.

 Scaling app instances

It is essential not to base the scaling of your application solely on response times. Although a slowdown in your application may be attributed to increased time in the request queue, it might not always be the case. Scaling your hosts without considering the status of the request queue could result in unnecessary expenses. Therefore, before scaling, ensure that your request queue is not empty.
The same principle applies to worker hosts as well. Scaling them should be determined by the depth of your job queue. Scaling the worker hosts becomes meaningless if no pending jobs are waiting to be processed. Both worker dynos and web dynos have incoming jobs (requests) that require processing, and their scaling should be based on the number of pending jobs in the queue.
NewRelic offers a feature to track the time spent in the request queue, but gems can also help you measure it independently. The scaling ensures minimal benefits should the time spent in the request queue be less than 5-10 ms of the average server response time. If you carefully consider these factors, it will provide you with the foundation for more efficient scaling decisions.

The brief for web app scaling on Ruby to 1000 RPM

Scaling a particular web application implies unique challenges. So, this list cannot be exhaustive. Our primary checklist includes the following recommendations:

  • Prioritize queue times over scaling hosts.
  • Scaling your hosts without addressing queue times would be counterproductive.
  • Understand Scaling Dynos: Scaling Dynos primarily increases throughput, not the actual speed of your application. If your app is slow, scaling should not be your immediate solution. Instead, focus on optimizing the application’s performance before scaling.
  • Choose the correct web server: Opt for a multi-process web server with features like intelligent routing and slow client protection. You choose between Puma or Unicorn with an Nginx front end. Also, Phusion Passenger 5 can be considered.
  • Leverage the three layers. Scaling implies working with three significant goals: minimize response time variability, cut response time, and increase application instances. As a result, you will obtain a scalable application requiring fewer instances, having lower response time variability and faster response times.


Optimization of the scaling process on a Ruby on Rails web application is not straightforward, but a correct and professional approach brings the right results. You can achieve maintainability and good performance using the best practices of this optimization.

The following articles may catch your attention, too:

In the following reviews, we will continue to cover our Ruby on Rails expertise and hope that this set of articles becomes part of your RoR guide in many instances.

Want to discuss your project? Let's talk!
By submitting request you agree to our Privacy Policy

Editor's Choice

Post Image
8 min read

QA. How to manage product & project risks effectively

In the context of Quality Assurance (QA), risk refers to the potential negative impact or uncertainty associated with the quality of a product…

Post Image
2 min read

Business Case: Reducing Maintenance Costs and Improving Efficiency

At JetRuby, we understand the importance of staging and pre-production servers to effectively test a product during its active development phase, ensure production…

Post Image
3 min read

5 Crucial Steps to Create Brand Identity for Online Stores

Creating a brand identity for online stores is a complex process that requires a deep understanding of business goals, as well as conducting…

Get the best content once a month!

Once a month you will receive the most important information on implementing your ideas, evaluating opportunities, and choosing the best solutions! Subscribe