jmtskw
Last Updated: February 25, 2016
·
21.85K
· cs3b
Michal czyz

Use IN instead of ANY in PostgreSQL

or how we reduce sql query from 150 seconds to 60 ms :-)

we have chain of functions that return array of records ids, and we use ANY operator - and query was very slow on large data sets.

SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3))

replacing = ANY(array) with IN(SELECT(UNNEST(array))), just like below

SELECT "users".* FROM "users"  WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))))

reduce query time from 150 seconds to 3 seconds :-) . replacing all ANY statements to IN SELECT UNNEST in function itself - get us to 60 ms :-)

Say Thanks
Respond

5 Responses
Add your response

3101

EXPLAIN ANALYZE results?

It's sometimes better know WHY engine behave like that...

over 1 year ago ·
3102
6bf9328eef0afadd9c9d05334f8dd42b

Still slow ;). You should try replacing IN with a join/distinct clauses and compare the results. Usually, when I see IN query in the code it raises alarm: slow query ahead.

over 1 year ago ·
3104
Michal czyz

@hubertlepicki in our case we would have to use 5 times LEFT OUTER JOIN in below function, witch are much slower in our case then IN (7 x times slower). I also see that if will follow your advice I would to have rewrite 3 other functions - I will not do this today :-).

Still I will remember about this - cause it might have sense sometimes to leave postgres to find the best strategy. Thanks

over 1 year ago ·
3105
Michal czyz

@galuszkak explain didn't give nothing interesting - not sure if postgres take into account custom functions plsql (or did something wrong)

over 1 year ago ·
18829
None

In the first example you will execute the function as many times as there are users.
In the second example you will only execute the function once.

The following test sample will show you this:

CREATE TABLE users(id integer primary key);
INSERT INTO users SELECT * FROM generate_series(1,10);

CREATE FUNCTION get_involved_user_ids_in_projects_for_user_id(int)
RETURNS int[]
LANGUAGE plpgsql
AS
$BODY$
DECLARE
    result int[];
BEGIN
    RAISE NOTICE '.';
    SELECT array_agg(i)
      INTO result
      FROM (SELECT (random()*9)::int+1 FROM generate_series(1,(random()*3)::int)) AS sub(i);
    RETURN result;
END;
$BODY$
VOLATILE;

SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3));

SELECT "users".* FROM "users"  WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))));

You will see it is executed 10 times in the first statement and only once in the second.

What you could do, it you are sure it is correct is to change the function into a stable function (instead of the default volatile). Declaring a function to be stable is saying that it will return consistent results within one sql statement.

It is explained in detail here:

http://www.postgresql.org/docs/current/static/sql-createfunction.html
http://www.postgresql.org/docs/current/static/xfunc-volatility.html

over 1 year ago ·