为什么使用“in”和“on”语句的查询会无限运行

Question

I have three tables, table3 is bascially the intermediate table of table1 and table2.我有三个表，table3 基本上是 table1 和 table2 的中间表。 When I execute the query statement that contains "in" and joins table1 and table3, it just kept running and I could not get the result.当我执行包含“in”并连接 table1 和 table3 的查询语句时，它一直在运行，我无法得到结果。 If I use id=134 instead of id in (134,267,390,4234... ) , the result comes up.如果我id in (134,267,390,4234... )使用id=134而不是 id ，则会出现结果。 I don't understand why "in" has the effect, does anyone have an idea?我不明白为什么“in”有效果，有人知道吗？

Query statement:查询语句：

select count(*) from table1, table3 on id=table3.table1_id where table3.table2_id = 123 and id in (134,267,390,4234) and item = 30;

table structure:表结构：

table1:
   id integer primary key,
   item integer
   
table2:
   id integer,
   item integer

table3:
    table1_id integer,
    table2_id integer

-- the DB without index was 0.8 TB after the three indices is now 2.5 TB
indices on: table1.item, table3.table1_id, table3.table2_id

env: Linux, sqlite 3.7.17环境：Linux，sqlite 3.7.17

Answer 1

from table1, table3 is a cross join on most databases, with the size of your data a cross join is enormous, but in SQLite3 it's an inner join. from table1, table3是大多数数据库上的交叉连接，对于数据的大小，交叉连接是巨大的，但在 SQLite3 中它是内部连接。 From the SQLite SELECT docs来自SQLite SELECT 文档

Side note: Special handling of CROSS JOIN.旁注： CROSS JOIN 的特殊处理。 There is no difference between the "INNER JOIN", "JOIN" and "," join operators. “INNER JOIN”、“JOIN”和“,”连接运算符之间没有区别。 They are completely interchangeable in SQLite.它们在 SQLite 中完全可以互换。

That's not your problem in this specific instance, but let's not tempt fate;在这种特定情况下，这不是您的问题，但我们不要诱惑命运； always write out your joins explicitly.总是明确地写出你的连接。

select count(*)
from table1
join table3 on id=table3.table1_id
where table3.table2_id = 123
  and id in (134,267,390,4234);

Since you're just counting, you don't need any data from table1 but the ID.由于您只是在计数，因此您不需要 table1 中的任何数据，而是 ID。 table3 has table1_id, so there's no need to join with table1. table3 有table1_id，所以不需要加入table1。 We can do this entirely with the table3 join table.我们可以完全使用 table3 连接表来做到这一点。

select count(*)
from table3
where table2_id = 123
  and table1_id in (134,267,390,4234);

SQLite can only use one index per table. SQLite 每个表只能使用一个索引。 For this to be performant on such a large data set, you need a composite index of both columns: table3(table1_id, table2_id) .要在如此大的数据集上执行此操作，您需要两列的复合索引： table3(table1_id, table2_id) 。 Presumably you don't want duplicates, so this should take the form of a unique index.大概你不想要重复，所以这应该采用唯一索引的形式。 That will cover queries for just table1_id and queries for both table1_id and table2_id;这将涵盖仅针对 table1_id 的查询以及针对 table1_id 和 table2_id 的查询； you should drop your table1_id index to save space and time.您应该删除 table1_id 索引以节省空间和时间。

create unique index table3_unique on table3(table1_id, table2_id);

The composite index will not for queries which use only table2_id, keep your existing table2_id index.复合索引不适用于仅使用 table2_id 的查询，保留现有的 table2_id 索引。

Your query should now run lickity-split.您的查询现在应该运行 lickity-split。

For more, read about the SQLite Query Optimizer .有关更多信息，请阅读SQLite 查询优化器。

A terabyte is a lot of data. 1 TB包含大量数据。 While SQLite technicly can handle this , it might not be the best choice.虽然SQLite 在技术上可以处理这个问题，但它可能不是最佳选择。 It's great for small and simple databases, but it's missing a lot of features.它非常适合小型和简单的数据库，但它缺少很多功能。 You should look into a more powerful database such as PostgreSQL .您应该研究更强大的数据库，例如PostgreSQL 。 It is not a magic bullet, all the same principles apply, but it is much more appropriate for data at that scale.它不是灵丹妙药，所有相同的原则都适用，但它更适合这种规模的数据。

为什么使用“in”和“on”语句的查询会无限运行

问题描述

1 个解决方案

解决方案1
1 2020-06-30 17:31:25

为什么使用“in”和“on”语句的查询会无限运行

问题描述

1 个解决方案

解决方案1 1 2020-06-30 17:31:25

解决方案1
1 2020-06-30 17:31:25