After Greg Sterndale’s presentation on a boston-rb hackfest earlier this month I noticed that not everyone knew the operators available for equality and comparisons in Ruby. Why not take the dust away from the blog and write about it, then?
Ruby has many equality operators, some of them we use and see everywhere in our applications (like the usual double equal - “==”) and some are also used everywhere but we don’t really get to see them (like the triple equal or case equal operator – “===”). So let’s dig into how Ruby implements comparisons between our objects.
You can see the full source code for this tutorial on Github.
Maybe this is something from my Java past, but I find it to be really useful to first define what being “equal” really means. Objects have identities for the Ruby interpreter, you can easy check this by calling the object_id method:
As you keep on creating new objects, Ruby itself will give you an object_id for each of them and you could possibly use this to identify all your objects. But there’s a little gotcha:
The two objects represent exactly the same value, but each of them has it’s own object_id, as the Ruby interpreter has no idea these two objects are the same. So, while object_id could be a shortcut to define the identity of our objects, each object has it’s own way to define what being equal to someone else really means.
Two Strings are equal when they represent exactly the same sequence of characters, two people would be the same if they had the same social security number (or the same CPF if you were in Brazil). To be able to implement this kind of identity the language must offer you with hooks for this and in Ruby these hooks are the equality methods.
The double equals method should implement the general identity algorithm to an object, which usually means that you should compare the object attributes and not if they are the same object in memory. And given Ruby is dynamically typed language, you should try not to depend on types, but on methods, instead of checking if the object is from an specific class, check if it responds to an specific method:
Instead of checking on the parameter type I check if the object has the method I expect to use, I it has then I do the comparison, if it doesn’t a “false” is sent up, which means that the objects are not equivalent.
If you’re already implementing “==”, you should also implement “eql?” and “hash” methods, even if you have never seen these methods being called anywhere. The reason is these are the methods the Hash object is going to use to compare your object if you’re using in as a Hash key. The thing is, Hashes have to be fast to figure out if a key is already in there and to be able to do this they just avoid comparing every single object, they just go by “clustering” objects in groups by using the value returned by your object’s “hash” method and then, once in a cluster, they compare the objects themselves using “eql?”.
Then searching for a key in a Hash, they first call “hash” in the key to figure out in which group it would be, then they compare the key with all the other keys in the group using the “eql?” method. In a worst-case scenario, we would compare with 3 objects (and the hash has nine of them), which is quite nice. If you were using an array, your worst case would be 9 comparisons.
For a quick example of how this could affect your code, let’s check this class:
The “==” method has been implemented correctly, but we haven’t implemented “hash” and “eql?”, so the two objects will end up in different clusters they we won’t be able to figure out they’re the same object inside the hash.
Even with @first
and @second
representing exactly the same text (and being equal) they still generate two keys in the Hash instead of one because we did not implement the “hash” and “eql?” methods. The general rule is that if you override “==” you should also override “hash” and “eql?”. When implementing “hash”, a basic rule to follow is if two objects are “==” they must generate exactly the same hash value (so they can be found in a Hash object), but two objects can have the same hash value but still be different (they belong to the same group but are not the same object).
Here’s a subclass of our StringWithoutHash, StringWithHash, that correctly implements both methods (and the “eql?” method itself is just using the “==”, we don’t even have to bother writting code for it):
And here are some specs showing how it works correctly now:
So, now that we implemented “eql?” and “hash” correctly even trying to add the object 50 times we stil have a single one, because our code can now be sure that the object is there (or not) as the methods are available.
Unless you know exactly what you’re doing (and you know how Ruby implement it’s hashes) you should not implement your own hashing function, just use a hashing function from a basic object like numbers and strings and you’re done. Here’s the example:
Here, instead of adding my own “hash” function I just reuse the hash method on Float, which already does it the right way. And unless you have some specific needs you should be doing the same.
Triple equals is an interesting operator, it’s everywhere in Ruby code but most people have never seen it in real code out there. But, how come it’s everywhere and no one has ever seen it? It’s hidden inside a common control structure, the “case/when”. Whenever you’re using a “case/when” you’re using, in fact, the “===” operator and this is what makes the case statement on Ruby much more powerful than it’s counterpart in languages like C or Java.
Let’s look at a statement:
This one will print ‘wild years’ and it’s using the ‘===’ operator. Can you see how it’s working? Here’s this same case/when done with if’s:
In the end, the case/when statement is just a glorified if using the ‘===’ operator to simplify your job (and also make you type less). In the language itself, the triple equals is used mostly to as a “grouping” operator, by getting a value and figuring out at which group it belongs to.
In our example it’s a group of ages represented by Range objects, but you’ll see this being used to figure out if an object is from an specific class, if a string matches a regular expression and the like. And all of this is possible because the method (‘===’) is not called at the object in the “case” definition but on each “when”.
Imagine you have Rectangles and you want to figure out if a specific Point is inside the Rectangle, this is a perfect fit for the triple equals. Let’s look at a sample implementation:
And the Rectangle class:
And now some usage:
The ‘===’ method will be called on Rectangle giving it a Point object and it’s going to figure out if that point is inside the Rectangle or not.
If you looked closely at the Point object you will notice that it includes the Comparable module and implements a method defined as <=>
, the flying saucer operator. The flying saucer operator is to be used as means to sort your objects in a collection, but the Comparable module brings some interesting functionality for classes that implement it.
The idea behind the <=>
operator is that when you call it on a object, this object must define it’s position compared to the other object given as a parameter. If the receiver of the call is greater than the argument, it should return 1, if they re the same, it should return 0, if the receiver is less than the argument it should return -1 and if they’re not compatible it should return nil.
Once you implement the flying saucer, you can just include Comparable in your class and the following methods are now implemented for you:
>
>=
<
<=
==
between?
This is why we could use the between?
method in Point when implementing the “===” operator on Rectangle. By implementing <=>
and including Comparable we get a lot of functionality for our objects for free and we also have a great example of how you should plan to build your own modules on your projects.
Going back to our first example, the Meter class, we could add new classes for Inch and Foot and have them all share the same equality implementation. First, we define that all our classes will have a to_meters method that will return their value in meters. Then we create our module:
We implemented the <=>
operator and (also the “hash” method, don’t forget it!) for this module and included Comparable, which will make all classes including it to be comparable too. Let’s look at how our new Meter, Inch and Foot will look like now:
And Inch:
And finally Foot:
All classes share the same comparison methods so we can use them all interchangeably in our code, we can even safely compare them with each other and they’ll yield the correct results:
You can mix and mingle different measures and they’ll all play and compare nicely to each other, they just have to implement the “to_meters” method and include the MeterComparator class, simple and right to the point implementation.
Also, once you include the Comparable module your objects become “sortable” in an array, you can use the “sort” and “sort!” methods in Array. The order is ascending as defined by your “<=>” method implementation.
Technically, “eql?” is should behave just like “==” and it’s also the method selected by the Hash class to figure out of your object is already in a “hash cluster” (as we have discussed above). You compare two objects to see if they represent the same values. Usually you can just override “eql?” and delegate it’s call to “==”, as we did in our examples. This is not already done for you, the default “eql?” implementation at the Object class uses comparison between “object_id” values and this is usually NOT what you want, so, make sure that if you implement “==” you also override “eql?” and implement “hash”.
There’s one exception, though, Numeric objects will convert different types when compared using “==” but will not do this when using “eql?”, so:
But:
And “equal?” is a little bit exoteric as it will compare if two objects are the same object in memory. You should never ever override this method. In fact, you’re better of ignoring the fact that “equal?” exist at all for your own safety. And don’t say you have not been warned.
While there’s a lot to be said about comparing objects in Ruby, the final implementation is quite simple and modules like Comparable make it even simpler as long as you know they exist. Now there’s no reason to correctly implement comparison for your Ruby objects and never forget the “hash” method again! ;)