ActiveRecord empty? vs count
During a discussion with a coworker about optimization of queries over rails, I need to prove my theory that count is faster than empty? because on the first method active record its making directly over sql and in the second is verifying the length of the given array by the result.
My Enviroment
ruby 1.9.3-p448
rails 3.2.6
So in order to prove it I've made the next benchmark
Code
Benchmark.bm do |bm|
bm.report("count") do
10.times{ User.where('id>0').count<0 }
end
bm.report("empty?") do
10.times{ User.where('id>0').empty? }
end
end
here the gist
Console
And my surprise was that both queries use COUNT(*)
count
[DEBUG] (0.6ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.6ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id > 0)
empty?
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.3ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.3ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.3ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.3ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
[DEBUG] (0.4ms) SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL AND (id>0)
Benchmark Result
user system total real
count 0.020000 0.000000 0.020000 ( 0.023109)
empty? 0.020000 0.000000 0.020000 ( 0.018685)
Conclusion
Empty? method is a little bit faster than count, so don't be afraid to use it, specially if you call it several times. My concern and I hope that someone can answer why is faster and why is calling COUNT(*) because it seems to me that ActiveRecord is doing all the magic.
Update 29/07/2013
In order to confirm my doubts I run the same tests over the stage database and with a large sample for 1000 times. And besides the slight differences empty? still beating count on real time.
user system total real
count 2.370000 0.100000 2.470000 ( 24.989100)
empty? 2.470000 0.090000 2.560000 ( 24.867949)
Written by Luis Ezcurdia
Related protips
6 Responses
It'd be nice to know why .empty? is faster. Maybe try increasing your sample size and averaging the result for a more convincing result.
It's always nice to know more about optimisation of AR queries!
I ran same benchmark few more times (1000) the SQL query was the same, both issue a COUNT(*) on the users table. My setup was Ruby 2.0.0-p195 with Rails 4.0.0 on a Postgresql DB, I was running a Rails console with production env.
Benchmark.bm do |bm|
bm.report("count") do
1000.times{ Identity.where('id>0').count }
end
bm.report("empty?") do
1000.times{ Identity.where('id>0').empty? }
end
end
The difference was really minimal like:
user system total real
count 0.280000 0.030000 0.310000 ( 0.437411)
empty? 0.280000 0.030000 0.310000 ( 0.428984)
But after a few runs count was faster then empty? was:
user system total real
count 0.290000 0.030000 0.320000 ( 0.427957)
empty? 0.310000 0.020000 0.330000 ( 0.454032)
So I don't think that the difference between them are related to the way they are coded on Rails, but to the database or the system at all.
count is a method on ActiveRecord::Calculations that issue the COUNT query to the database.
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation/calculations.rb#L22
empty? is a method on ActiveRecord::Relation, it first check if the relation has been loaded from the database, if it was then #empty? is evaluated against it, there is no query sent to the database; if it wasn't loaded then, guess what? it sends #count message to ActiveRecord::Calculation.
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation.rb#L244
So at the end if #empty? needs to go to the database it goes thought a #count.
Mario, answering one of your questions, And my surprise was that both queries use COUNT(*)
, empty? will only do that if you haven't loaded the results.
This is the implementation:
def empty?
return @records.empty? if loaded?
c = count
c.respond_to?(:zero?) ? c.zero? : c.empty?
end
This is good to know, because you could preload them if you know you would use them, then empty? would be faster because it won't result in a DB query. Something else that's important to consider is caching this if you plan to use it more than once per request to minimize the DB calls.
Ooops, I just read your last comment and I noticed you found that answer.
Yes, I did it :)
What about User.where('id>0').exists?