Last Updated: February 25, 2016
· Damon Aw

Make PG arrays actually use the GIN index (Don't use ANY)

Since Rails 4 came out, PG array fields have become very popular.

Every blog post tells you that you can easily add a GIST or GIN index and use ANY to query the array field for a membership test. Typically:

SELECT * FROM users WHERE '' = ANY (emails); 

But this will not use your GIN index

On a table with more than 500,000 records and 700,000 entries in the email array field:

daemonsy=# EXPLAIN ANALYZE SELECT emails FROM people WHERE '' = ANY(emails);
                                                  QUERY PLAN
 Seq Scan on people  (cost=0.00..186652.80 rows=1894 width=13) (actual time=577.370..577.370 rows=0 loops=1)
   Filter: (''::text = ANY (emails))
   Rows Removed by Filter: 714859

Total runtime: 577.413 ms
(4 rows)

Using contains, @>,

daemonsy=# EXPLAIN ANALYZE SELECT emails FROM people WHERE emails @> ARRAY['']::text[];
                                                              QUERY PLAN
 Bitmap Heap Scan on people  (cost=2130.68..9109.09 rows=1894 width=13) (actual time=0.009..0.009 rows=0 loops=1)
   Recheck Cond: (emails @> '{foobar@gmail.Com}'::text[])
   ->  Bitmap Index Scan on index_people_on_emails  (cost=0.00..2130.20 rows=1894 width=0) (actual time=0.007..0.007 rows=0 loops=1)
         Index Cond: (emails @> '{}'::text[])

Total runtime: 0.049 ms

The differences on a large dataset is evident. I am not sure why PG there is this gotcha, maybe ANY and @> are not equivalent.

If you're looking for a simple membership test, e.g. items tagged "good", using contains(@>) is definitely better than ANY.

This was adapted from work (some censorship) and we are using an custom gin indexed citext[] field. Apologize in advance if there's any inconsistencies.

Found the problem initially when we were migrating a large join table to an array field and got the answer here.