Use IN instead of ANY in PostgreSQL
or how we reduce sql query from 150 seconds to 60 ms :-)
we have chain of functions that return array of records ids, and we use ANY operator - and query was very slow on large data sets.
SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3))
replacing = ANY(array) with IN(SELECT(UNNEST(array))), just like below
SELECT "users".* FROM "users" WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))))
reduce query time from 150 seconds to 3 seconds :-) . replacing all ANY statements to IN SELECT UNNEST in function itself - get us to 60 ms :-)
Written by Michał Czyż
Related protips
6 Responses
EXPLAIN ANALYZE results?
It's sometimes better know WHY engine behave like that...
Still slow ;). You should try replacing IN with a join/distinct clauses and compare the results. Usually, when I see IN query in the code it raises alarm: slow query ahead.
@hubertlepicki in our case we would have to use 5 times LEFT OUTER JOIN in below function, witch are much slower in our case then IN (7 x times slower). I also see that if will follow your advice I would to have rewrite 3 other functions - I will not do this today :-).
Still I will remember about this - cause it might have sense sometimes to leave postgres to find the best strategy. Thanks
@galuszkak explain didn't give nothing interesting - not sure if postgres take into account custom functions plsql (or did something wrong)
In the first example you will execute the function as many times as there are users.
In the second example you will only execute the function once.
The following test sample will show you this:
CREATE TABLE users(id integer primary key);
INSERT INTO users SELECT * FROM generate_series(1,10);
CREATE FUNCTION get_involved_user_ids_in_projects_for_user_id(int)
RETURNS int[]
LANGUAGE plpgsql
AS
$BODY$
DECLARE
result int[];
BEGIN
RAISE NOTICE '.';
SELECT array_agg(i)
INTO result
FROM (SELECT (random()*9)::int+1 FROM generate_series(1,(random()*3)::int)) AS sub(i);
RETURN result;
END;
$BODY$
VOLATILE;
SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3));
SELECT "users".* FROM "users" WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))));
You will see it is executed 10 times in the first statement and only once in the second.
What you could do, it you are sure it is correct is to change the function into a stable function (instead of the default volatile). Declaring a function to be stable is saying that it will return consistent results within one sql statement.
It is explained in detail here:
http://www.postgresql.org/docs/current/static/sql-createfunction.html
http://www.postgresql.org/docs/current/static/xfunc-volatility.html
Never tried IN instead of ANY in PostgreSQL, maybe something will really come out of this?)