Why query with “in” and “on” statement runs infinitely

Question

I have three tables, table3 is bascially the intermediate table of table1 and table2. When I execute the query statement that contains "in" and joins table1 and table3, it just kept running and I could not get the result. If I use id=134 instead of id in (134,267,390,4234... ) , the result comes up. I don't understand why "in" has the effect, does anyone have an idea?

Query statement:

select count(*) from table1, table3 on id=table3.table1_id where table3.table2_id = 123 and id in (134,267,390,4234) and item = 30;

table structure:

table1:
   id integer primary key,
   item integer
   
table2:
   id integer,
   item integer

table3:
    table1_id integer,
    table2_id integer

-- the DB without index was 0.8 TB after the three indices is now 2.5 TB
indices on: table1.item, table3.table1_id, table3.table2_id

env: Linux, sqlite 3.7.17

Answer 1

from table1, table3 is a cross join on most databases, with the size of your data a cross join is enormous, but in SQLite3 it's an inner join. From the SQLite SELECT docs

Side note: Special handling of CROSS JOIN. There is no difference between the "INNER JOIN", "JOIN" and "," join operators. They are completely interchangeable in SQLite.

That's not your problem in this specific instance, but let's not tempt fate; always write out your joins explicitly.

select count(*)
from table1
join table3 on id=table3.table1_id
where table3.table2_id = 123
  and id in (134,267,390,4234);

Since you're just counting, you don't need any data from table1 but the ID. table3 has table1_id, so there's no need to join with table1. We can do this entirely with the table3 join table.

select count(*)
from table3
where table2_id = 123
  and table1_id in (134,267,390,4234);

SQLite can only use one index per table. For this to be performant on such a large data set, you need a composite index of both columns: table3(table1_id, table2_id) . Presumably you don't want duplicates, so this should take the form of a unique index. That will cover queries for just table1_id and queries for both table1_id and table2_id; you should drop your table1_id index to save space and time.

create unique index table3_unique on table3(table1_id, table2_id);

The composite index will not for queries which use only table2_id, keep your existing table2_id index.

Your query should now run lickity-split.

For more, read about the SQLite Query Optimizer .

A terabyte is a lot of data. While SQLite technicly can handle this , it might not be the best choice. It's great for small and simple databases, but it's missing a lot of features. You should look into a more powerful database such as PostgreSQL . It is not a magic bullet, all the same principles apply, but it is much more appropriate for data at that scale.

Why query with “in” and “on” statement runs infinitely

Question

1 answers

solution1
1 2020-06-30 17:31:25

Why query with “in” and “on” statement runs infinitely

Question

1 answers

solution1 1 2020-06-30 17:31:25

solution1
1 2020-06-30 17:31:25