From time to time I have to run queries against our MongoDB collections picking documents created over specific time ranges but not all of those collections have an index at the created_at
field. This led to some of them taking too long to run even on our secondaries or the hidden replica.
This changed when I was presented to the fact that the ObjectId
values produced by Mongo already include a timestamp on them!
I could just use the collection _id
field to do date range queries and since the field is indexed by default I didn’t even have to care about indexing it myself.
If you’re using Mongoid, a simple way of creating these values is using the Moped::BSON::ObjectId.from_time
method. Let’s look at an example that finds all items created today:
This finds all items that were created after the date you provided (at_beginning_of_day
is a method that’s included by ActiveSupport
at Time
objects) and it should be really quick due to the index at _id
that’s already there all the time.
If you have to go through a lot of data and don’t want to keep cursors alive at the server (avoid long running cursors on MongoDB) you can use this same method to build a manual cursor that loads all items for every day for 30 days and does stuff to them. Here’s how it would look like:
Here we find any items that were created at that day until the beginning of the next one. You can build any kind of date range query against ObjectId
fields just by making use of the from_time
method, if you can create a Time
object, you can produce an ObjectId
value to use it as a filter.
This also works if you’re using the main MongoDB driver, just use the BSON::ObjectId.from_time method that works exactly the same way. Give it a Time
object and it will give you the ObjectId
value that represents that timestamp.
And if you’d like to get an ObjectId
without writing any Ruby code at all, use Steve Ridout’s ObjectId generator and you should be good to go.
Now stop indexing your created_at
field :)