简体   繁体   中英

Achieving FOR-loop like functionality in Postgresql

I've been meddling with databases for a few years now and I'm starting to be pretty decent with most SQL/Postgresql-queries but I still don't understand how a simple FOR-like query should be done in it. Here's an example in pseudocode:

FOR id IN SELECT ids FROM parents WHERE name ilike '%something%' LOOP
    SELECT parent_id, max(timestamp) FROM children WHERE parent_id = id;
END LOOP;

Note: One parent can have and often has multiple children so there's a one-to-many-relationship between them.

The desired result of that query should be like:

parent_id, max(timestamp)
5, 2015-09-18 10:00:46.684824+03
6, 2015-09-18 10:00:47.684824+03
8, 2015-09-18 10:00:48.684824+03
etc.

The query itself doesn't have to be a for-loop. I'm just interested in how this query should be expressed in SQL since I quite often seem to have a need for it.

Thanks!

A few ways, some better than others.

I general I advocate learning to think in sets when working with SQL and relational databases. JOIN s start making lots of sense when you think of them as operations on sets. So do filters like WHERE and GROUP BY . You'll often find that you can start expressing your queries in English and just "translate" them to SQL after a while. (Or maybe I just write way, way too much SQL and I'm damaged now).

A join with grouping and aggregation

Using a join and GROUP BY is in my view the clearest and simplest way to express it. You say "here's the relationship between these two tables, now for each p.ids get me the max(c.timestamp)".

SELECT
   p.ids,
   max(c.timestamp)
FROM parents
  LEFT OUTER JOIN children c ON (p.ids = c.parent_id)
WHERE p.name ILIKE '%something%'
GROUP BY p.ids;

I used a LEFT OUTER JOIN because, in your simple FOR loop, you'd get a result with a parent_id and null max if there were no matching rows. This preserves the same behaviour. If you want no row at all when there are no child rows, use an inner join .

A correlated subquery

SELECT
   p.ids,
   (SELECT max(timestamp) FROM children c WHERE c.parent_id = p.ids)
FROM parents
WHERE p.name ILIKE '%something%';

This approach is limited to cases where you only want one field from the associated child table unless you start doing horrible things with composite records. It'll generally result in the same query plan as the join approach, but it's less flexible.

It's closer to the "for loop" approach, in that it's saying "for each parent row do this on the child table".

A FOR loop in PL/PgSQL

This is slowest and is clumsy, but almost literally what you wrote.

FOR id IN SELECT ids FROM parents WHERE name ilike '%something%' LOOP
    RETURN QUERY SELECT parent_id, max(timestamp) FROM children WHERE parent_id = id;
END LOOP;

Yes, I copied your code almost verbatim. It looks like perfectly valid PL/PgSQL except that there's no destination for the results. In the form above you'd need to declare the procedure RETURNS TABLE(...) .

This last one is PL/PgSQL so it's only valid in a function.

It's the closest to what you wrote, and the simplest when thinking procedurally, but it's actually slow and cumbersome.

There are several solutions. You could use join and a group by for example. My preferred solution in such a case is the most direct one:

select
    id,
    (select max(timestamp) from children where parent_id=parents.id)
from parents WHERE name ilike '%something%';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM