Storing data in a hierarchy allows for flexibility and cleanliness at once. A simple parent-child relationship, much like how files and directories are laid-out on a file system or in cloud storage.
A basic hierarchy would look like this, using populations as our data type (think classifications of creatures):
create table populations ( id integer not null, parent_id integer null name character varying not null ) id parent_id name ------------------- 1 NULL Clam 2 1 Giant 3 2 Boring 4 NULL Moray Eel 5 4 Green 6 4 White-Eyed 7 5 Giant
This allows someone to easily add creatures and sub-classifications all they like without having many tables with similar or the same structure.
Now it comes to querying. You have to answer requests like:
- How many kinds of clam are there?
- Show me all types of creatures with their full "path"
- Show all related records that are any kind of Moray Eel
You can easily find yourself writing application code that causes N+1 queries, building big lists, nested hashes, strings, etc, all while degrading performance and decreasing code legibility.
Thanks to recursive common table expression in PostgreSQL we can solve most problems in a single query.
Here's the full query, and then we'll break it down:
WITH RECURSIVE pops (id, level, name, name_path) AS ( SELECT id, 0, name, ARRAY[name] FROM populations WHERE parent_id is null UNION ALL SELECT p.id, t0.level + 1, p.name, ARRAY_APPEND(t0.name_path, p.name) FROM populations p INNER JOIN pops t0 ON t0.id = p.parent_id ) SELECT id, level, name_path AS category, ARRAY_TO_STRING(name_path, ' > ') FROM pops
This starts with
WITH RECURSIVE which tells PostgreSQL we're introducing a recursive CTE.
pops is the alias I've given it, followed by a list of fields that will be in the table. We don't specify types; they're copied from the "root" query.
Next, we have a simple
SELECT statement. This is where we seed our recursive CTE. We pick the rows from the underlying table that represent the root data: data that has no parent. I also added a few bits of data here,
level to indicate how deep a record is nested, and then initialised an
ARRAY that we can build on to generate paths for child records. This query is executed only once.
UNION ALL statement to indicate we're going to start the recursive query. The next
SELECT statement is where we find child records. You see here that we join over to
pops, which is the current set of records for that CTE. This is where we're adding all the child records. This query is executed until there are no results returned.
A couple things we're doing here to make things easier and cleaner:
ARRAY_APPEND, we're adding the name of this element onto the names from previous elements. This allows us to construct the path later however we see fit.
- We increase the
levelby 1. It may not matter to you, but there are cases where knowing the level can make things easier, or indicate a problem with your data depending on you business needs
Lastly, we query the
pops CTE. Using
name_path we can indicate a "category" of the record. This is the name of the root nodes. Then.
ARRAY_TO_STRING concatenates the array elements using the second argument as the separator. Just like a
id level category path 1 0 Clam Clam 2 1 Clam Clam > Giant 3 1 Clam Clam > Boring 4 0 Moray Eel Moray Eel 5 1 Moray Eel Moray Eel > Green 6 1 Moray Eel Moray Eel > White-Eyed 7 2 Moray Eel Moray Eel > Green > Giant
This approach has the benefit of good performance, flexibility, and once you understand CTEs, it's quite approachable. Feel free to ask questions in the comments and I'll help if I'm able.