Table of Contents
Summary: This article reviews server configuration approaches that will help you to maximize the Ruby app performance.
Our previous article dealt with the Ruby web app optimization scaling challenge. This time, we will address the server configuration as a crucial way to optimize the Ruby app’s performance.
Key server configuration settings
Misconfigurations in Ruby web application servers can bring everything to a standstill. Interestingly, application servers don’t dramatically boost app speed; they function similarly, and switching between them won’t significantly change throughput. However, avoiding wrong settings or server misconfigurations is crucial for Ruby web app optimization as they commonly hinder client applications.
This article will focus on optimizing resource usage (CPU and memory) and boosting throughput (requests-per-second) for the three primary Ruby application server configurations: Unicorn, Puma, and Passenger. For the sake of simplicity, “server” and “container” are interchangeable terms here since the principles apply universally.
The three widely used application servers share a similar design. These servers are subdivided into other processes that handle incoming requests, like a thread. While the servers’ fundamental structure is the same, some nuances can impact performance significantly.
We want to handle the most requests per second using the least server resources for the Ruby web app optimization.
Resource consumption and performance largely depend on specific settings. The principal server settings affecting performance are:
- Number of threads
- Number of child processes.
- Container volume
Let’s delve into each of these settings.
You will find Passenger Enterprise and Puma maintaining the multi-thread structure. The following description of thread structure is relevant for those two server configurations.
Threads provide a resource-efficient way to boost throughput and enhance your application’s concurrency. Fortunately, Rails is already threadsafe. It means most applications don’t engage in peculiar practices, such as doing custom threads.
Thus, being thread-safe is the characteristic of most Ruby applications. The best way to verify this is to try it for the Ruby web app optimization. Usually, Ruby applications expose thread bugs in a loud and exceptional manner, letting us experiment and assess the results quickly.
Now, let’s explore the optimal number of threads to use. You can boost as much performance from extra parallelism as the volume of your program’s simultaneous execution allows. In the context of MRI/C Ruby, parallelization is primarily limited to tasks waiting on IO (such as waiting for a database result). For most web applications, this IO-bound portion is approximately under 25% of their total processing time. It was established that thread settings exceeding five have minimal impact on client applications.
In contrast to the process count, which requires constant monitoring and tuning based on metrics, it’s enough to rely on initial thread settings. The number is around five threads per process.
It’s essential to be aware that in MRI/C Ruby, threads impact memory significantly. Thus, checking the memory consumption before you add lines is crucial. A similar check-up is done after the thread installation. Don’t expect each thread to consume only an extra 8 megabytes. They boost memory resources by far more excellent.
Now, let’s describe how to set threads up:
As a side note, threads can be parallelized. You can increase thread counts until CPU memory depletes.
The number of child processes
Web servers like Unicorn create one app process to copy it and, therefore, have multiple child processes responsible for the incoming requests job. The challenge is to let the server have as many processes as it endures without exhausting its capacity.
It’s among the best practices to let all Ruby web applications have three operations per container. This approach enhances routing capacity. Puma and Unicorn allow the OS to distribute load among process copies hosted on one socket, creating a balance.
On the other hand, Passenger utilizes a reverse proxy (for example, nginx) so that requests will be routed to child processes, loading the least busy operation first. This ensures that requests are quickly routed to idle workers. Both approaches are viable. Routing higher-level requests is even more challenging, as these layers often lack information about the server’s workload.
Let’s take, for instance, a setup with three servers, each running one operation (in total, we have three processes). The task is optimally making the load balancer route requests to those servers. Suppose these processes are busy handling requests; a new request may be assigned to an already busy server, even if other servers have idle processes.
We can mitigate this risk by running more processes per server, ensuring requests back up at the socket level or reverse proxy until a worker is free. Each server handling three processes is a recommendation. If resource constraints prevent running at least three processes per server, consider upgrading to a larger server.
The available memory and CPU resources determine the maximum number of child processes it can run. Each child process consumes a certain amount of memory. We shouldn’t add more child processes than the server can support to avoid overloading the server’s RAM.
Notably, the memory usage of Ruby processes is logarithmic, which means it approaches a limit instead of leveling off due to memory fragmentation. Measuring the exact memory usage of a single Ruby application process can be tricky, as it may increase over time due to various factors. To obtain an accurate measurement, disable all process restarts (worker killers) and wait up to 24 hours before using ‘ps’ to measure memory usage. Usually, most Ruby applications use 200-400 megabytes per process; some might take up to 1 gigabyte.
Ensure you leave some headroom for memory usage. As a guideline, consider setting your child process count to a value that keeps the memory consumption in check.
Going beyond the memory capacity of a server can lead to significant slowdowns due to memory overcommitment, which causes swapping. Ensuring predictable and consistent memory usage is crucial to maintain optimal application performance. It helps to avoid sudden spikes in memory usage.
Equally important is not surpassing the available CPU capacity of the server. Ideally, we should keep CPU usage below 5% of the total deployed time. Exceeding this threshold indicates a CPU bottleneck that can impact overall performance. While most Ruby and Rails applications are typically memory-bottlenecked on various cloud providers, the CPU can also become a limiting resource. To identify CPU-related issues, server monitoring tools like AWS’s built-in tools can be used effectively to determine if CPU usage frequently reaches its maximum.
Previously, there was a misconception that OS context switching incurred significant expenses, but actual production use has shown otherwise. While it is often advised not to have more child processes per server than CPUs, this is not entirely viable. Although it is a reasonable starting point, CPU usage is a crucial metric to monitor and improve. In practice, most applications are satisfied with 1.5th the number of hyper threads as a process count.
Employing the “log-runtime-metrics” feature for Heroku users provides a CPU load metric in the logs. Keeping an eye on the five and 15-minute load averages, especially when they consistently approach or exceed 1, indicates a CPU nearing its maximum capacity, necessitating the cut in child processes.
Fortunately, setting child process counts is relatively straightforward across various application servers:
To wrap it up, three-eight processes per server can be used by most applications. You are only limited by the resources available to you. Web apps with 95th percentile times have access to higher numbers, up to 4x the number of hyper threads. In most cases, with a process count exceeding 1.5x, the number of hyper threads is not recommended.
Copy-on-write (COW) technique
The COW technique works in all Unix-based operating systems. When a child process emerges as the result of the fork, the child’s memory is entirely shared with the parent process.
Thus, any memory reads from the child process directly accesses the parent’s memory. However, if the child process modifies a memory location, a copy of that specific memory portion is created only for the child’s private use. This mechanism proves highly advantageous in reducing the memory usage of forking web servers. It enables child processes to share “read-only” memory, such as shared libraries, with the parent, rather than duplicating it.
COW is an inherent process that cannot be explicitly ‘supported,’ but there are ways to make it more effective in conserving memory. Although it cannot be “turned off,” enhancing its effectiveness is achievable. The idea is to load the entire application before forking, often called “preloading” in many Ruby web app servers. This alteration modifies when the fork is called – before or after the application’s initialization.
Additionally, it becomes necessary to reconnect to databases being utilized after forking, as demonstrated with ActiveRecord.
Theoretically, this reconnection should be done for each database used by the application. However, in practice, Sidekiq generally defers connecting to Redis until it performs an actual operation. As a result, reconnecting after fork may not be mandatory unless Sidekiq jobs are executed during application boot.
Despite the benefits of copy-on-write, there are certain limitations to consider.
Transparent Huge Pages can result in an entire 2 MB page being copied even for a 1-bit memory modification, and fragmentation may also impede potential savings.
Nevertheless, enabling preloading remains a recommended practice, as it has no adverse effects and can prove beneficial in various scenarios.
It is crucial to optimize server configuration, aiming to employ around eighty percent of memory. Nevertheless, different applications have different requirements, impacting the CPU/memory volume ratio. No one-size-fits-all size exists, so it’s vital to choose the appropriate ratio based on the metrics you use in production.
Memory volume is most important among processing resources and often needs careful tuning. Sometimes, providers offer limited memory options, such as Heroku, with only 512 megabytes. Complex Ruby applications have a particular demand for large memory volumes.
Three hundred megabytes of random access memory is a typical volume Rails applications require. You want to have three processes per server at a minimum. Therefore, a server with a minimum of 1 GB of memory is generally necessary for most Rails applications.
Furthermore, the CPU needs optimization, too. How many CPU cores can we have? Is Hyper-Threading supported? The answer to these questions determines how many threads can be executed simultaneously.
Each container is expected to keep three child processes at a minimum. Ideally, request routing gets much better with eight or more operations per container. It reduces latency, resulting in better overall performance.
Instead of conclusion: 4 steps of a successful Ruby web app optimization
- Determine memory usage for one worker with five threads under production load.
- Start a few workers on a server and keep them for at least 12 hours without restarting.
- Utilize ‘ps’ to retrieve the worker’s memory usage.
- Container size selection:
- Choose a container with a memory capacity of at least three times the memory usage determined in the previous step.
- The RAM of most Rails apps is approximately 300-400 megabytes per worker, which means a minimum of 1 GB container is recommended.
- This way, you obtain sufficient memory volume to accommodate three processes per server at a minimum.
- CPU Core consideration:
- Check the number of hyper threads and CPU cores in your system. If the container’s hyper threads are fewer than the memory can support, you have two options:
- Pick up a container volume with more CPU cores or less memory;
- Child processes should be 1.25-1.5 times the number of hyperthreading.
- Deploy and monitor:
- Deploy your application and carefully observe memory consumption.
- Adjust the child process count and container volume to optimize resource utilization and maximize throughput.
Following these steps, you can efficiently scale and fine-tune your Ruby web application servers for optimal performance and responsiveness.
Check out related articles: