简体   繁体   中英

Optimizing sql query: check for all rows in table B if any rows in table C reference the same row in table A

I have 3 tables, A, B and C structured like this

CREATE TABLE a (
    id SERIAL NOT NULL PRIMARY KEY
);

CREATE TABLE b (
    id SERIAL NOT NULL PRIMARY KEY,
    a_id INT REFERENCES a(id) ON DELETE CASCADE
);

CREATE TABLE c (
    id SERIAL NOT NULL PRIMARY KEY,
    a_id INT REFERENCES a(id) ON DELETE CASCADE
);

Where the relationships are many-to-one. What i want is, for every row in table b, i want to check if any row in table c has a reference to the same row in table a. Now, I already have the query

SELECT
    b.id,
    true
FROM
    b
WHERE EXISTS (
    SELECT 1
    FROM c
    WHERE b.a_id = c.a_id
)
UNION
SELECT
    b.id,
    false
FROM
    b
WHERE NOT EXISTS (
    SELECT 1
    FROM c
    WHERE b.a_id = c.a_id
)
ORDER BY id

Though I am not certain, I think this is doing double work, and going through the table twice, and I am wondering how I could optimize it to only traverse the table once.

Is it possible with a simple query, or do I have to do anything complex?

Simply move the EXISTS clause into your SELECT clause.

SELECT
  b.id,
  EXISTS (SELECT null FROM c WHERE c.a_id = b.a_id) AS c_exists
FROM b;

The same with an IN clause, which I prefer for being even a tad simpler:

SELECT
  id,
  a_id IN (SELECT c.a_id FROM c) AS c_exists
FROM b;

This can be done with a subquery, a left join, and a case.

The subquery gets you a list of distinct c.a_id values.

             SELECT DISTINCT a_id FROM c;

Then do this

SELECT b.id, 
       CASE WHEN distinct_ids.a_id IS NULL THEN 'false'
            ELSE 'true' END has_c_row
  FROM b
  LEFT JOIN (
                SELECT DISTINCT a_id FROM c;
       ) distinct_ids ON b.a_id = distinct_ids.a_id 
        

This shape of query is called an antijoin or IS NULL... LEFT JOIN . It detects the rows in the first table that don't match rows in the second table.

The subquery gives us a view of the data in table c with at most one row per each distinct a_id value. Without the subquery, we might get duplicate rows in the result query.

This eliminates your WHERE EXISTS correlated subqueries ; even though PostgreSQL's query planner is pretty smart, sometimes it does the slow thing with subqueries like that.

If it is still too slow for you, create these indexes on the a_id columns.

ALTER TABLE b ADD INDEX a_id (a_id);
ALTER TABLE c ADD INDEX a_id (a_id);

i think i understand what you are after

this is how i would do it

SELECT b.id, ISNULL(res.result,0) as result
FROM b
LEFT JOIN (
    SELECT c.id, 1 as result
    FROM c
    INNER JOIN a on a.id = c.id
) res on b.id = res.id

i dont think you need to worry about distinct if they are all unique ids

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM