select from two independent table based on not exist a row in a third table joining them

Question

I have a problem regarding an SQL query. I need all rows from two independent tables those do not have a row joining them in a third table. The query is working this way but it has a very bad performance.

now my query look like this:

SELECT s.id, 
       u.id 
FROM   table1 s, 
       table2 u 
WHERE  NOT EXISTS
       ( 
              SELECT * 
              FROM   table3 sj 
              WHERE  sj.s_id=s.id 
              AND    sj.u_id=u.id
       )

Keys on table3 are:

ALTER TABLE `table3`
    ADD PRIMARY KEY (`id`),
    ADD KEY `s_id` (`s_id`),
    ADD KEY `u_id` (`u_id`);

table1 has 4 rows, table2 has 80.000 rows, table3 has 30.000 rows

Any ideas how to optimise it? Now the query takes up to 20 minutes to give results.

Edit: Regarding the 20 minutes -> i forgot to set a key on the table3(u_id) After setting the key it required just some seconds. Great.

Answer 1

Your query seems to me like the right way to do what you want. I would just rewrite the old-school implicit join to an explicit cross join (but that's semantically équivalent).

For performance, you need an index on table3(s_id, u_id) .

However, you need to keep in mind that cross joining the tables generates a derived tables of about 2.4 billions rows, so there is still lot of work to do for the database in the not exists condition.

If sid and uid are not unique in the source table, then you can deduplicate before cross joining:

select ...
from (select distinct id from table1) s
cross join (select distinct id from table3) u 
where not exists (...)

select from two independent table based on not exist a row in a third table joining them

Question

1 answers

solution1
1 ACCPTED 2020-05-31 16:11:40

select from two independent table based on not exist a row in a third table joining them

Question

1 answers

solution1 1 ACCPTED 2020-05-31 16:11:40

solution1
1 ACCPTED 2020-05-31 16:11:40