Sidekiq Batches is a feature in the Sidekiq job processing framework that allows you to process jobs in batches. With Sidekiq Batches, you can group a set of jobs together and execute them as a single batch, enabling you to manage dependencies and perform batch processing more efficiently. In this article, we’ll share our experience in using this tool in a real project and give you some insights.
The Challenge
In one of our projects, we have a synchronization process that ensures seamless communication between multiple systems. This is a common scenario where you need to synchronize a list of resources across different services. Naturally, we utilize Sidekiq to handle this background process.
The algorithm is quite straightforward:
- Retrieve a list of resources;
- Initiate synchronization for each resource.
The second step can also be executed individually if only a single resource needs to be synchronized.
Our resources include:
- University: UniversitySync class
- Faculties: FacultySync class
- Departments: DepartmentsSync → DepartmentSync
- Courses: CoursesSync → CourseSync
- Professors: ProfessorsSync → ProfessorSync
- Students: StudentsSync → StudentSync
- Enrollments: EnrollmentsSync → EnrollmentSync
These granular classes make it easy to scale the synchronization process across various resource types. However, from a business standpoint, we can’t perform synchronization for all resources in parallel due to dependencies among them. For instance, Students must be synchronized before Enrollments, and both University and Faculties need to be synchronized before any other resources.
So, how do we maintain this synchronization order while using the existing architecture with vanilla Sidekiq?
One simple approach is to introduce delays between synchronization (and originally it was implemented in this way before). However, this method comes with its own set of challenges:
- Unpredictability. Delays are difficult to manage and maintain because the required time for each synchronization process might vary. This can lead to situations where a dependent resource starts synchronizing before its parent resource, causing inconsistencies.
- Inefficiency. Implementing fixed delays can cause resources to wait idly, even if their dependent resources have already been synchronized. This leads to wasted processing time and resources.
- Unclear completion. We can’t define where the whole synchronization process could be finished.
However, Sidekiq (Pro and Enterprise versions) has a Batches mechanism, which is more suitable for this case. Thus, we can combine work in a batch with sub-batches and close the gaps described before.
Learn more about how Sidekiq suggests developing complex workflows with batches and sub-batches here.
It’s pretty neat, but there are several downsides with the provided code organization:
- Callbacks are all in one class, which theoretically simplifies things a bit, but in realitya massive list of batches and sub-batches creates a mess. From a testing perspective it would be better to have separate steps in each class.
- Developers need to worry about batches’ API, so in order to create a sub-batch you need to:
– initialize the parent batch `Sidekiq::Batch.new(status.parent_bid)`
– wrap the sub-batch under `jobs` from the parent batch and wrap other workers under `jobs` as a sub-batch.
Here’s an example:
Implementing a simple wrapper
Possible requirements for the wrapper:
- An ability to define batches and sub-batches in one class
- Having callbacks for the batch near the batch class
With this wrapper, we can define batches with sub-batches this way:
As you can see, batches and sub-batches classes look almost the same, because a mess with Sidekiq Batch API is hidden in the Base class.
Hopefully, you found this article helpful. Follow our blog and don’t miss new insightful posts!