Use IN instead of ANY in PostgreSQL

or how we reduce sql query from 150 seconds to 60 ms :-)

we have chain of functions that return array of records ids, and we use ANY operator - and query was very slow on large data sets.

SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3))

replacing = ANY(array) with IN(SELECT(UNNEST(array))), just like below

SELECT "users".* FROM "users"  WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))))

reduce query time from 150 seconds to 3 seconds :-) . replacing all ANY statements to IN SELECT UNNEST in function itself - get us to 60 ms :-)

#performance

#postgresql

Related protips

How do types affect PosgreSQL query planning

1.224K

Postgresql - Correcting bad logging data

595

6 Responses

Add your response

galuszkak

EXPLAIN ANALYZE results?

It's sometimes better know WHY engine behave like that...

over 1 year ago ·

hubertlepicki

Still slow ;). You should try replacing IN with a join/distinct clauses and compare the results. Usually, when I see IN query in the code it raises alarm: slow query ahead.

over 1 year ago ·

cs3b

@hubertlepicki in our case we would have to use 5 times LEFT OUTER JOIN in below function, witch are much slower in our case then IN (7 x times slower). I also see that if will follow your advice I would to have rewrite 3 other functions - I will not do this today :-).

Still I will remember about this - cause it might have sense sometimes to leave postgres to find the best strategy. Thanks

over 1 year ago ·

cs3b

@galuszkak explain didn't give nothing interesting - not sure if postgres take into account custom functions plsql (or did something wrong)

over 1 year ago ·

feikesteenbergen

In the first example you will execute the function as many times as there are users.
In the second example you will only execute the function once.

The following test sample will show you this:

CREATE TABLE users(id integer primary key);
INSERT INTO users SELECT * FROM generate_series(1,10);

CREATE FUNCTION get_involved_user_ids_in_projects_for_user_id(int)
RETURNS int[]
LANGUAGE plpgsql
AS
$BODY$
DECLARE
    result int[];
BEGIN
    RAISE NOTICE '.';
    SELECT array_agg(i)
      INTO result
      FROM (SELECT (random()*9)::int+1 FROM generate_series(1,(random()*3)::int)) AS sub(i);
    RETURN result;
END;
$BODY$
VOLATILE;

SELECT "users".* FROM "users" WHERE "users"."id" = ANY(get_involved_user_ids_in_projects_for_user_id(3));

SELECT "users".* FROM "users"  WHERE "users"."id" IN (select(unnest(get_involved_user_ids_in_projects_for_user_id(3))));

You will see it is executed 10 times in the first statement and only once in the second.

What you could do, it you are sure it is correct is to change the function into a stable function (instead of the default volatile). Declaring a function to be stable is saying that it will return consistent results within one sql statement.

It is explained in detail here:

http://www.postgresql.org/docs/current/static/sql-createfunction.html
http://www.postgresql.org/docs/current/static/xfunc-volatility.html

over 1 year ago ·