Table of Contents
Cronycle takes the noise of mass media and provides users with only the content that matters to them most. With its powerful Boolean search filtering, they can create highly customizable feeds that deliver exactly what they’re looking for. Once users find articles they want, they can share them with their team on different boards. Teams can add articles, annotate, and highlight relevant information on these boards without having to leave the platform. Cronycle takes out the step of copying and pasting links between different apps or platforms for joint work efforts.
By creating digital collaborative spaces in which professionals can select and annotate the most relevant articles or documents it brings back the days of mind mapping and brainstorming in order to enable a rich, interactive experience between a team.
Challenges
Before we started working on the project, the Cronycle platform comprised several Heroku-based applications to parse different sorts of content from the websites that have no API. The system data was stored on Amazon s3 while Amazon RDS hosted the database. From a financial standpoint, using these third-party cloud platforms was rather expensive, e.g. the US-based retail giant charged $10k per month. Speaking of costs, Embedly, another third-party platform used by Cronycle to aggregate content, also charged few cents for each URL it was processing. And since more and more people started using Cronycle, the number of URLs often jumped over hundreds of thousands!
So the first point on our challenges-to-solve list was designing a reliable architecture and migrating to services that would suit Cronycle both technologically and financially-wise. In other words, we needed to make the platform fully independent. As the amount of information processed was also constantly growing, the platform had to grow as well in order to satisfy the needs of its users.
Solution
For dedicated server hosting, the Cronycle team suggested using Hetzner as they were already familiar with this provider. The first thing we moved was the database. Next, we reindexed Elasticsearch using the resources of the application. As soon as reindexing was finished, the brand new Elasticsearch was good to go. Moving Redis was easy and all about “dump+restore”.
When it came to moving the Postgres database, though, we stumbled upon that Amazon RDS doesn’t allow for common data migrations like master-slave replication or replication with Londiste. Using “pg_dump/pg_restore” was not an option because it would put the application into downtime. And today, when information is shared at warp speed, even little delay in delivering it can be critical. Looking for other ways to deal with this problem, we settled on the Bucardo replication system. Being hardly the fastest of its kind, Bucardo, however, doesn’t require making any changes in the original Postgres cluster. The whole data migration procedure took us quite a while. Though, we didn’t have to stop the application which was definitely a win.
Deploying Heroku-like PaaS allowed for using the same old git-push available in the original Heroku. After all, why reinvent the wheel?! At that point, both the apps and the services were running in the same place which was three dedicated staging servers combined into a cluster. If either an app or a service would go down, the other one would become unavailable as well. To prevent that, we decided to separate the apps from the services by moving them to the new servers.
The part about parsing was the most elaborate of all because it required us to conduct a rather deep research on the topic. Ultimately, not only did we develop new applications but also implemented new system features and tweaks. E.g. recently we have implemented a feature that allows for grabbing author’s name on a website page.
Result
We built a large, complex, and independent content-aggregation platform with micro-service architecture that helps all kinds of users find and organize relevant content with just little effort. The database of the application is powered by Elasticsearch and enables storing terabytes of data. At the same time, the platform’s new infrastructure is packed with a full set of services to provide great in-app experience and shield users from the mass media noise. Currently, Cronycle is used by 2308 companies across the globe.
Technologies used
Microservices/Applications:
- Rails monolithic applications;
- Ruby On Rails JSON API;
- Python Flask applications;
- Node.js applications;
- Heavy usage of asynchronous background jobs in different parts of the system;
- Communication between microservices via HTTP protocol.
Platform:
-
- Bare Metal servers running CentOS and CoreOS;
- KVM-based Virtual Machines running CentOS and CoreOS;
- Docker containerization;
- Containers orchestration with CoreOS built-in fleetd;
- Containers orchestration with Kubernetes;
- Deis PAAS;
- Openstack Swift and Openstack Keystone;
- Postgres;
- Redis;
- Elasticsearch;
- Memcache;
- Ansible provisioning;
- Monitoring and alerting with Datadog and Pagerduty;
- Centralized logging with Graylog 2.
Moreover, you can keep up to date with what Cronycle is up to by liking, following and friending its socials below:
Facebook — https://www.facebook.com/cronycle
Twitter — https://twitter.com/cronycle
LinkedIn — https://www.linkedin.com/company/cronycle/