Make PG arrays actually use the GIN index (Don't use ANY)
Since Rails 4 came out, PG array fields have become very popular.
Every blog post tells you that you can easily add a GIST or GIN index and use ANY
to query the array field for a membership test. Typically:
SELECT * FROM users WHERE 'foo@bar.com' = ANY (emails);
But this will not use your GIN index
On a table with more than 500,000 records and 700,000 entries in the email
array field:
daemonsy=# EXPLAIN ANALYZE SELECT emails FROM people WHERE 'foobar@gmail.com' = ANY(emails);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on people (cost=0.00..186652.80 rows=1894 width=13) (actual time=577.370..577.370 rows=0 loops=1)
Filter: ('foobar@gmail.com'::text = ANY (emails))
Rows Removed by Filter: 714859
Total runtime: 577.413 ms
(4 rows)
Using contains, @>
,
daemonsy=# EXPLAIN ANALYZE SELECT emails FROM people WHERE emails @> ARRAY['foobar@gmail.com']::text[];
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on people (cost=2130.68..9109.09 rows=1894 width=13) (actual time=0.009..0.009 rows=0 loops=1)
Recheck Cond: (emails @> '{foobar@gmail.Com}'::text[])
-> Bitmap Index Scan on index_people_on_emails (cost=0.00..2130.20 rows=1894 width=0) (actual time=0.007..0.007 rows=0 loops=1)
Index Cond: (emails @> '{foobar@gmail.com}'::text[])
Total runtime: 0.049 ms
The differences on a large dataset is evident. I am not sure why PG there is this gotcha, maybe ANY
and @>
are not equivalent.
If you're looking for a simple membership test, e.g. items tagged "good", using contains(@>
) is definitely better than ANY
.
This was adapted from work (some censorship) and we are using an custom gin indexed citext[]
field. Apologize in advance if there's any inconsistencies.
Found the problem initially when we were migrating a large join table to an array field and got the answer here.
http://stackoverflow.com/questions/22106169/how-does-one-query-a-postgres-array-using-the-gin-index