Last Updated: February 25, 2016
·
514
· jtai

ActiveRecord: Use efficient queries for counting

Counting is deceptively simple. Even a five year old can count — how hard could it be? The difficulty lies in counting efficiently. In a new application, you can often get away with counting inefficiently because your data set is small and database load is minimal. In a more mature application, there is more data, the database is busier, and there are outliers. Perhaps users on your blogging platform have five posts on average, but a few of your most dedicated users may have thousands of posts. This makes performance problems difficult to spot in development or even production. Ironically, your most dedicated users are the ones affected the most!

Here are a few patterns for writing more efficient counting code. The examples below are for a Rails 3 app using ActiveRecord on MySQL, but the same SQL principles should apply to other languages and frameworks.

Use count or size instead of length

If you only need to know how many of something there are, use count or size, not length. For example, if you want to know how many posts a user has, use user.posts.count. This issues a relatively efficient COUNT(*) query:

SELECT COUNT(*) FROM `posts` WHERE (`posts`.user_id = 8)

Calling user.posts.length gives you the same result, but under the hood ActiveRecord issues a SELECT * query and creates an object for each post, then counts the objects. Not only is this more expensive for the database, it’s more expensive in ruby as well.

SELECT `posts`.* FROM `posts` WHERE (`posts`.user_id = 8)

Use exists? instead of empty? or comparing count to 0

Sometimes you may not even need to know the exact count — you may only care whether there are some or none of something. In these situations, use exists? instead of empty? or count > 0. To continue the previous example, if you want to know whether or not a user has any posts, use user.posts.exists?. This issues the following query:

SELECT 1 FROM `posts` WHERE (`posts`.user_id = 8) LIMIT 1

Calling user.posts.empty? essentially does the same thing as user.posts.count > 0, which issues a COUNT(*)query. This has similar performance when the collection is small, i.e., the user has only a few posts. However, if the user has many posts, the database will be forced to count them all. With the SELECT 1 ... LIMIT 1 query, the database can answer the query as soon as the first post is found.

Use offset(x).exists? instead of comparing count to x

What if you need to know if there are a few of something? For example, you want to know if a user has 3 or more posts; if so, they’re no longer considered a new user. You can still use and benefit from exists?! Use user.posts.offset(3).exists?, which issues the following query:

SELECT 1 FROM `posts` WHERE (`posts`.user_id = 8) LIMIT 1 OFFSET 3

As before, the database can answer the query as soon as it finds 3 posts. Calling user.posts.count > 3 is functionally equivalent, but if the user has many posts, the database will count them all.

This tip was reposted from my blog, jontai.me