Forget counter cache for many-to-many associations in Rails
I like counter cache in Active Record.
Counting associations represents a performance improvement by avoiding eager loading and N+1 queries for listings -- e.g. the index page for Posts that shows how many Comments each Post has.
However I've had a tough time trying to use counter cache in many-to-many associations, both with has_many through
and has_and_belongs_to_many
. After trying for a while, I've come to the conclusion that it is best not to use counter cache for these types of associations. It is terrible for performance because it generates several SQL queries that could be cut down to just 4 or less queries.
I'll explain with examples.
The scenario
Let's say we have posts with tags through taggings. Posts keep the count of their tags:
class Tagging
# FIELDS: post_id, tag_id
belongs_to :tag
belongs_to :post, counter_cache: :tags_count # updates tags_count in Post
end
class Tag
# FIELDS: title
has_many :taggings
has_many :posts, through: :taggings, dependent: :destroy
end
class Post
# FIELDS: content, tags_count
has_many :taggings
has_many :tags, through: :taggings, dependent: :destroy
end
Note that dependent: :destroy
will destroy taggings, not tags or posts. Now the tags_count
field will be automatically updated any time a post is created, updated or deleted.
Bad news: multiple SQL updates
Counter cache works by performing an SQL update each time a new association is created or deleted.
When a new post is created with a few tags (actually 99 tags for the sake of example), Active Record will perform multiple updates:
INSERT INTO posts (content) VALUES ("Lorem ipsum")
INSERT INTO taggings (post_id, tag_id) VALUES (1, 1);
UPDATE posts SET tags_count = tags_count + 1 WHERE posts.id = 1;
INSERT INTO taggings (post_id, tag_id) VALUES (1, 2);
UPDATE posts SET tags_count = tags_count + 1 WHERE posts.id = 1;
...
INSERT INTO taggings (post_id, tag_id) VALUES (1, 99);
UPDATE posts SET tags_count = tags_count + 1 WHERE posts.id = 1;
-- ======================
-- TOTAL QUERIES: 2×N + 1
-- ======================
When destroying a single tag that is associated with many (99) posts:
SELECT * FROM posts INNER JOIN posts_tags ON posts.id = posts_tags.post_id WHERE posts_tags.tag_id = 1
DELETE FROM posts_tags WHERE posts_tags.tag_id = 1 AND posts_tags.post_id = 1
UPDATE posts SET tags_count = tags_count - 1 WHERE posts.id = 1;
DELETE FROM posts_tags WHERE posts_tags.tag_id = 1 AND posts_tags.post_id = 2
UPDATE posts SET tags_count = tags_count - 1 WHERE posts.id = 2;
...
DELETE FROM posts_tags WHERE posts_tags.tag_id = 1 AND posts_tags.post_id = 99
UPDATE posts SET tags_count = tags_count - 1 WHERE posts.id = 99;
DELETE FROM tags WHERE tags.id = 1
-- ======================
-- TOTAL QUERIES: 2×N + 2
-- ======================
This is clearly detrimental for objects with dozens or hundreds of associations.
Replace counter cache with callbacks
I've decided to quit the automatic counter cache for many-to-many associations. Let's use callbacks instead:
class Post
# FIELDS: content, total_tags
has_and_belongs_to_many :tags
before_save :update_total_tags
def update_total_tags
self.total_tags = tag_ids.count
end
end
class Tag
# FIELDS: title
has_and_belongs_to_many :posts
before_destroy :update_posts
def update_posts
Post.where(id: post_ids).update_all('total_tags = total_tags - 1')
end
end
Note that I replaced the taggings
table with a posts_tags
join table. I also replaced tags_count
with total_tags
to avoid automatic counter caching due to naming conventions. This automatic behavior is thankfully deprecated and will be gone in Rails 5.
By using before_save
we make sure to set the total count before the INSERT statement (thus avoiding a late UPDATE). By counting tag_ids
we avoid an additional COUNT query on tags. Let's see how the same examples presented before perform now.
When the user creates a post with many tags:
INSERT INTO posts (content, total_tags) VALUES ("Lorem ipsum", 99)
INSERT INTO posts_tags (post_id, tag_id) VALUES (1, 1);
INSERT INTO posts_tags (post_id, tag_id) VALUES (1, 2);
...
INSERT INTO posts_tags (post_id, tag_id) VALUES (1, 99);
-- ======================
-- TOTAL QUERIES: N + 1
-- ======================
IMPROVE ALERT: this could be further reduced to just 2 queries by bulk-inserting into posts_tags
. If anyone out there knows how to do that in Active Record, please comment.
When destroying a single tag associated with many posts:
SELECT posts.id FROM posts INNER JOIN posts_tags ON posts.id = posts_tags.post_id WHERE posts_tags.tag_id = 1
UPDATE posts SET total_tags = total_tags - 1 WHERE posts.id IN (1, 2, ..., 99)
DELETE FROM posts_tags WHERE posts_tags.tag_id = 1
DELETE FROM tags WHERE tags.id = 1
-- ======================
-- TOTAL QUERIES: 4
-- ======================
Conclusion
Please DO use counter cache for one-to-many associations only. It's perfect for these situations!
But after all this it seems pretty obvious that the built-in counter cache feature is not suitable for many-to-many associations (maybe this was already obvious to many developers out there). I decided to write this post anyway to advice others who are considering going down that road. I hope I convinced you guys otherwise :)