Table of Contents
Have you ever heard of the broken windows theory? If not, the broken windows theory implies that visible signs of disorder and neglect in a community, such as broken windows or graffiti, can lead to an increase in crime and antisocial behavior. And what if I told you that you can apply this theory to coding?
In coding, the broken windows theory suggests that ignoring or neglecting small issues (e.g., code quality, naming conventions, or documentation) can lead to a deterioration of overall codebase quality and an increase in software bugs and maintenance challenges over time. However, refactoring is here to help you lessen the impact of this phenomenon. If you want to make your code better and your life easier, here’s a list of tips on continuous refactoring you should consider taking into account.
Understanding refactoring
In simple terms, continuous refactoring means regularly improving and tidying up your code as you work on it, just like keeping a room clean by picking up after yourself, so that it remains easy to understand and maintain, rather than letting issues pile up and become harder to fix later.
Note that refactoring only works if your code is thoroughly tested. The easiest way to find out if your code can be refactored is to break it and check how many tests have failed. For example, you can change a condition or delete a private method call. If a few tests, or, ideally, a couple of unit tests and a dozen integration tests have failed, you can easily refactor your code. That’s why you should keep in mind that tests are an important part of the project. No refactoring will work without them. So the basic improvement of any project is writing tests. If you want to do better, find out from testers, business analysts or other colleagues the expected behavior and write a test. Having mentioned that, we can get to the most interesting part – the tips themselves.
Literals to constants
Literals are fixed values that are explicitly written in your source code to represent specific data types, such as numbers, text, or boolean values, without any calculations or variables involved. They often engage with business logic, so any code where conditional operations are performed on literals must be rewritten. Note that we’re specifically talking about the literals that are used to define application rules (not configuration!) or separate parts of code.
Large code bases are usually full of literals. Take MVC, for example. Literals leak into View, causing bugs and preventing extensibility. But here’s a solution. Just give those literals a name and put them next to the code that depends on them. This simple yet effective tip can make your life so much easier.
Naming literals links them to entities in the code, making any subsequent changes, whether large refactoring or extension of application functionality, much simpler and more predictable. In special situations, following this rule allows you to extend the functionality of individual classes by literally adding one line of code.
For example, it is better to bring a very common condition like the one below to the form in which there will be a named variable in the right part of the comparison operation. This constant may not be in the project code yet, then you need to create it, but it’s better to look for it anyways – maybe someone has already declared it and accidentally missed a place.
if object.type == “MyObject”
In this case, the condition may look somewhat like the ones below, meaning that it needs refactoring.
if object.type == ALLOWED_OBJECT_TYPE
# or
if object.type == MyObject.name
Which means that the condition can be refactored even further, like this:
if object.is_a?(MyObject)
It is worth noting the use of literals as keys in the lists. They not only hide the purpose of these keys, but can also lead to unpleasant bugs, when the same key wanders between the branches of business logic and at some point turns into a symbol. This is less the case for Hashes as literals( { ‘keyname’ => SOME_VALUE } ), but more for operations.
my_object_hash[‘keyname’] == SOME_VALUE
may be changed to
my_object_hash[MY_VALUE_KEY] == SOME_VALUE
Overall,
- If you see a literal (a constant or a method of an existing class) in the condition, name it
- If you see a literal used as a key, name themIf you see “magical numbers”, name them
Name conditions
There’s nothing worse than debugging multiline conditions with every possible operation from Boolean algebra in them. But once you start naming those conditions, you’ll see how patterns begin to appear. But don’t be fooled by the optimality of fulfilling conditions such as the following:
If condintion_one_assertion || condition_two_assertion || condition_three_assertion
This example shows us that conditions 2 and 3 won’t be tested if condition 1 returns “true”. Saved CPU seconds will be more than compensated for the time spent on debugging, and loss of reputation from dissatisfied customers. After all, such constructions often look much more confusing:
if one_objesct.status == STATUS_ONE ||
DateTime.current.in?((1.day.ago)..(1.day.since)) ||
curret_user.role == ADMIN_ROLE
Feel free to name these conditions. If there is a real problem with performance (after all, the logic behind the calculation of one boolean value can be quite complicated) – hide the logic into the method.
Named conditions bring clarity to code and eliminate cognitive errors in the interpretation of these conditions:
cbject_in_good_state = one_objesct.status == STATUS_ONE
it_is_already_time = DateTime.current.in?((1.day.ago)..(1.day.since))
Current_user_has_permission = curret_user.role == ADMIN_ROLE
if object_in_good_state || it_is_already_time || current_user_has_permission
Name and group scopes
When you see a series of calls of ActiveRecord::QueryMethods you’re probably looking at code that needs to be refactored. Declare scope for most common QueryMethods. This will save time on refactoring and help you eliminate bugs when the same condition is changed in only one part of the logic.
It’s not necessary to name big scopes at first. On the contrary, try to keep the conditions inside the scope as atomic as possible. However, if you notice constructions of 3 or more scopes start to form, consider naming them too.
Visibility of methods/interfaces
In general, this applies to projects developed following the object-oriented paradigm where violating the Open-Closed principle leads to problems on the architectural level.
If you see a class that has more than one method (not including the constructor method) and all the methods are declared as public, this very well might be a poorly written class and you can benefit from hiding some of its methods behind the “private” keyword. This will protect the project from potential misuse of the class in the future, as well as its refactoring would do. It’s much easier to refactor private methods than methods that you don’t know where and how can be used. That’s why if the method is not used outside the class body (e.g. in other classes or somewhere else in the code) this method should be declared as private.
Variable and class namings
If you see variables, methods, classes or modules in the code whose names confuse you, it probably makes sense to make some adjustments – give them more explicit names or a name that better describes its usage. If the name of your class is a verb then it’s safe to say that you have found a procedure. Rewrite the class and write the procedure.
Generally try to follow a simple principle that if your code describes some action, you’re most likely working with a procedural or functional approach and it should be named with a verb. For example, AuthenticateUserWithPassword, RetrieveDataFromEndpoint, CloneAttributes. Also such objects often don’t require additional methods as they are only supposed to carry out one operation, which the name suggests.
If a class describes some abstract object – an API client, a data validator or a message sender – then you’re most likely working with an example of OOP and its name should be a noun (not to be confused with verbal nouns. For example, SpreeApiClient, RequestDataValidator, EventsNotifier.
Examples such as UserAuthenticatorWithPassword or DataFromEnpointRetriever are terrible and should be converted into procedures with an explicit verb in the name.
Interface shrinking
Following the interface segregation principle our object should have reasonably simple and specific interfaces. In this case, we’re not always talking about redoing the entire dependency chain or refactoring services and controllers. It is often enough to fix places in your code that look like this:
def call(**params)
def my_method(arg1, arg2 = 1, arg3 = 2, arg4 = 3, kwarg1:, kwarg2: 4)
Generally you should always choose methods with named arguments rather than methods with an unnamed and infinite set of input parameters. Such methods are often difficult to call and even more often they are called with too much data. The earlier a method accepting an infinite set of data is passed to a specifically defined interface, the less problem you’ll have to deal with. Because it’s much easier to extend an interface in the future, adding 1 or 2 arguments, than to remove the support of 50 arguments that are used in various combinations in different parts of the application.
There are edge cases where you can pass a structure to a method, the structure being a typed (as much as possible) object with an explicitly written interface.
The same should be done with “swiss-knife-services”. The single responsibility principle suggest that we create objects that are able to do as few things as possible. If you see a class that does 5 different things, break it up into 5 smaller classes. It won’t do any harm but the number of problems with understanding these classes will be much smaller.
Get rid of state where it’s not necessary
One of the most frequently used approaches that needs to be redone is using states where they are not really necessary. In OOP, states are usually used to manage an object’s behavior. This exact use case is described in the Dependency Inversion principle. However in web applications that we develop, the DI principle is not always applicable. Especially when the state of the object stores data on which it performs some kind of operations. Almost always, this logic can be redone in a procedural style and you get rid of a state in class objects.
Let’s take this class as an example:
class MyLogicDoer
def initialize(my_object, options)
@my_object = my_object
@options = options
end
def do_logic(argument)
do_piece_of_logic
...
do_another_piece_of_logic
end
private
def do_piece_of_logic
# calls @my_object
end
def do_another_piece_of_logic
# calls @options
end
end
Such a class can be turned into a procedure that will have no state, an easily testable logic, and may even use less memory:
class DoMyLogic
def self.call(my_object, options, argument)
do_piece_of_logic(my_object)
...
do_another_piece_of_logic(argument)
end
private
def do_piece_of_logic(my_object)
# receives my_object as an argument to call
end
def do_another_piece_of_logic(options)
# receives options as an argument to call
end
end
In this example, we simplified the code entity and now only logic is described in it, so there is no “complexity” that the class implies. Best case scenario – a function.
Such code is easier to use. You don’t need to build a class object and it’s easier to maintain – finding the module name in the code base is enough to understand where it is used. It can be combined with other modules and even passed as an argument somewhere on the road of functional programming with its higher order function.
Remove logic from the constructor
If you ever come across a code that describes a convenient abstraction with some of the logic described in the constructor of this abstraction, you can easily get rid of it, especially if this logic may crash or needs too many resources. It doesn’t matter if it’s a part of preparation logic or it’s the logic of configuration and access verification. There is no place for such things in the constructor. Barely any developer thinks that MyClass.new can issue an error or cause memory leak. Thus, any logic in the constructor will sooner or later lead to unexpected results. Here’s what you can do.
- Move to a separate method
From this:
class MyClass
def def initialize(attribute)
@attribute = attribute
process_attributes(attributes)
end
def my_logic
...
end
End
To this:
class MyClass
def def initialize(attribute)
@attribute = attribute
end
def my_logic
process_attributes(@attributes)
...
end
End
- Move state to memoization
class MyClass
def def initialize(attribute)
@attribute = attribute
end
def my_logic
get_processed_attributes
...
end
def other_logic
get_processed_attributes
end
private
def get_processed_attributes
if defined?(@processed_attributes)
return @processed_attributes
else
...
end
end
end
Optimize data structures
The choice of optimal data structures is equally important, but more complex. Usually, in Ruby or JavaScript, developers don’t think much about how much it costs them to use certain embedded objects, because memory management lies with the interpreter, and the implementation of a long list of interfaces for certain “specialized” structures is already present in standard classes. Therefore, they take Array for collections, and Hash for lists or structures.
This advice can be divided into two parts. The first one precisely concerns the misuse of embedded data structures, when several improvements can be made to the code at once with small changes. The second one is more about changing the data flow or reformatting them to achieve certain results.
- Use Set, SortedSet
In Ruby, in addition to the usual list (Array), there are several other implementations for data collections. Using them, you can achieve significant performance improvements and improve code readability.
Any place where the #uniq method is called on an Array class object, can be easily replaced by using Set, which will initially store only unique elements. Another feature of Set is that, compared to Array, it checks the occurrence of an element in the collection faster, therefore, if somewhere in the code within the same logic branch the Array class object is often called the #include? method (or its analogues) it is worth considering using Set.
SortedSet has all the same features, but it also returns objects in sorted order. Therefore, if the #uniq and #sort methods are called on an Array class object, SortedSet is definitely needed here.
- Reassign data to make it easy use
It is difficult to give any universal advice here, because the problems of data flow management, their processing and use cannot be considered inseparably from the field of application. Therefore, we won’t touch on the areas of Big Data, or the problems of parallel access.
We will talk about common examples when unwillingness or fear to change the data structure before performing any operations on them starts to bring problems, whether it’s the speed of code writing, semantic errors or irrational use of resources. We’ll focus only on a couple of small tips that will help you figure out how to organize the data flow yourself.
- Hash#key? and Hash#[] are quicker than Array#include? or Array#find
- Array[Object], Array[Object, Object] easily turn into Hash with the help of the Hash::[] method
- Methods that change the state of an object (methods with an exclamation mark) are not so terrible if the data doesn’t leave the scope of the method, because they allow you to save some memory and time.
- Hash#new takes as arguments a block that will execute every time Hash#[] does not find the requested key
- Enumerable#inject with an argument and a block (…inject({}) { |hash, value| }) makes it very convenient to build a new Hash
OpenStruct и HashieMash as arguments for methods
Perhaps some of the most difficult refactorings were associated with these classes. We can understand developers who add them to their logic: it’s fast, convenient, and doesn’t require a description of structures and interfaces. You just take an object, put data in it, and send it deep into the dependency chain, hoping that everything will work.
But as a development team that had to maintain code and business logic that depended on objects of these classes, we do not recall that any of these advantages made our lives easier.
Therefore, in code we prefer to replace them with Struct or some other alternative “wrapper” that allows you to restrict the interface.
#Map VS. #Each
If you find a place in the code where #map is used (or another iterator that returns a new data structure), and the result of its execution is not used anywhere, it may be worth replacing it with an iterator #each.
There are also other cases when the data structure is created using #each. For example:
hash = {}
some_values.each do |val|
hash[...] = val
End
In such cases, it is advised to explicitly replace such a construction with something else, for example #reduce:
hash = some_values.reduce({}} { … }
Get rid of magic
In our practice we always follow a simple rule – explicit is better than implicit. So when we come across a logic that a computer can’t interpret on the first try, we either try to rewrite the code or ignore it. But when ignoring is not an option, here are some examples of such a code that can be improved by removing all the “magic” around it.
- #constantize and #safe_constantize methods
In our practice, using these methods (as well as using Kernel#eval or ObjectSpace) to build logic has never done any good. There are several more convenient techniques that do not require deep refactoring, allowing you to achieve similar behavior without harm to the code.
# bad code
“SomeModule::#{my_string.upcase}”.constantize.new
# good code
MY_MODULES_MAP = {
‘first_class’ => SomeModule::FirstClass,
‘third_class’ => SomeModule::ThirdClass,
‘i_havent_thought_that_this_would_be_the_case’ => OtherModule::OtherClass
}
MY_MODULES_MAP.fetch(my_string).new
# or you can use a good old case/when statement
case my_string
when ‘first_class’:
SomeModule::FirstClass.new
….
# and so on
end
The beauty of such constructions compared to the “elegant” magic of #constantize is that for a person who looks into this code, it immediately becomes clear which “my_string” are valid, and which dependencies the file in which this code is written has.
- #send and #public_send
No matter how much simpler and “more beautiful” they look, these two have no place in your code. These methods hide the real “scope” of the area of responsibility, and do not allow you to write normal logic tests. And sometimes they cause very strange and hard-to-diagnose bugs.
Get rid of them as quickly as possible from legacy code, and do not use them in the new one. Again, the good old case/when can help you out.
What works together lives together
Rails naming conventions have only one goal – a quick unambiguous interpretation of the project. However, this goal does not correlate at all with the real project problem – complexity.
Your project contains 300k lines, most of which are stored in the services folder? Most likely, this is a very complex project. The default file structure of a Rails project is not always suitable for complex projects. Therefore, you can keep a more transparent structure of files and folders describing the real purpose of the code in them, as well as grouping this code so that a person opening a project can immediately assume what this project is all about.
The file structure in the project should be classified according to its functional connectivity. For example, if you have a lot of files that are responsible for exporting PDFs, shove them into the app/modules/export/pdf folder. In the same folder, you can put the activerecord model, if there is one. For example, do that to store an export record with a link to the exported document. Need Sidekiq for export? Then feel free to create a worker in the same folder app/modules/export/pdf/some_action_worker.rb
Rail controllers, in turn, are pretty much tied to routing, and if you don’t want to bother with a zeitwerk loader, you’ll have to leave them as they are. You don’t write business logic in controllers, do you? If you can’t sleep knowing that the controllers live separately, you can move them to the module folder, just don’t forget to explicitly upload your controllers to application.rb and put together a definition.
However, transformations of this kind require much more effort and clear communication with other team members. Therefore, coordinate such changes in advance, or find out if it will not contradict the standards and documentation of the project.
Missed our previous article? Follow our blog and don’t miss our next post!