Last Updated: February 25, 2016
·
1.134K
· comerford

MongoDB: How can I load recent data into memory using the default ObjectID field/index?

The premise here is how to load a subset of data (in this case recent data) using the default index on the _id field and the default content of that field (ObjectID).

Per the ObjectID spec, each ObjectID begins with a 4-byte value representing the seconds since the Unix epoch. Therefore, we should be able to leverage that time value to select an appropriate range of values.

Once we have our criteria, it is simply a matter of making sure all that data is loaded into memory. If you want to load all data in an index, or in a collection, into memory you can use the touch command, but for a subset of that data, a different approach is needed.

To ensure that all data is loaded, we will use the hint() method to specify the index used (removes any ambiguity) and the explain() method (to ensure the query is run to completion).

The use of the hint method is straight forward, but the use of explain is less obvious. The reason for this is to ensure that the query is run in its entirety. If you left it out, them you would get the first batch of results back (and a cursor) and then have to iterate on that cursor until the results were exhausted.

Because explain needs to see the entire operation in order to provide timing information, the whole query is executed, and so all the data is loaded into memory.

Finally, we can manipulate the query itself to load the index only or the index and the data (similar to the touch command) should we wish to be selective. All we need now is our criteria, and that is relatively easy to construct:

First, this little snippet gets you an epoch time in seconds (getTime returns in millis) and stores it in the decTime variable:

decTime = Math.round(((new Date().getTime())/1000));

This is very easy to turn into a time 3 days ago, for example:

last3Days = Math.round(((new Date().getTime() - (3 * 24 * 60 * 60))/1000));

However, the ObjectID is stored in hex (run new ObjectId() in the shell to see an example) so we need to convert. Again, this is relatively easy:

hexTime = decTime.toString(16);

Now, all we need to do is add padding and pass our constructed string in to create our ObjectID:

testId = ObjectId(hexTime+"0000000000000000");

Here's a sample run in the shell:

> decTime = Math.round(((new Date().getTime())/1000));
1398422583
> last3Days = Math.round(((new Date().getTime() - (3 * 24 * 60 * 60))/1000));
1398422690
> hexTime = decTime.toString(16);
535a3c37
> hex3Days = last3Days.toString(16);
535a3ca2
> testId = ObjectId(hexTime+"0000000000000000");
ObjectId("535a3c370000000000000000")
> historicId = ObjectId(hex3Days+"0000000000000000");
ObjectId("535a3ca20000000000000000") 

Finally, we need to put this all together and use the ObjectId we have constructed as our query criteria. Again, in a general form, and assuming _id as the indexed field that would look like this:

db.collName.find({"_id" : {"$gt" : historicId}}).hint({"_id" : 1}).explain();

Putting this all into a function with some parameters allows this to be easily reused, and you can find a very basic example in this gist: prejheat.js.

Note: I originally wrote this up as a Q&A over on SO, so if you find it useful, feel free to send some rep my way over there too.