MySQL 在没有主键的情况下将表加入自身

Question

I created two tables in mysql.我在 mysql 中创建了两个表。 Each contains an integer idx and a string name.每个都包含一个 integer idx 和一个字符串名称。 In one, the idx was the primary key.一方面，idx 是主键。

CREATE TABLE table_indexed (
    idx     INTEGER,
    name    VARCHAR(24), 
    PRIMARY KEY(idx)
);
CREATE TABLE table_not_indexed (
    idx     INTEGER,
    name    VARCHAR(24)
);

I then added the same data to both tables.然后我将相同的数据添加到两个表中。 3 million lines of distinct values to idx (1-3_000_00, randomly arranged) and 3 million random arrangements of 8 lowercase characters to name. 300 万行不同的值到 idx（1-3_000_00，随机排列）和 300 万行 8 个小写字符的随机排列来命名。

Then I ran a query where I joined each table to itself.然后我运行了一个查询，将每个表连接到自身。 The table without the primary key runs almost 3 times as fast.没有主键的表运行速度几乎快了 3 倍。

mysql> SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (11.80 sec)

mysql> SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (4.12 sec)

EDIT: Asked mySQL to Explain the query.编辑：要求 mySQL 解释查询。

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                      | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL                     | 3171970 |   100.00 | Using index |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | index_test3000000.t1.idx |       1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>

Answer 1

In both cases it does a table scan of t1, then looks for the matching row in t2.在这两种情况下，它都会对 t1 进行表扫描，然后在 t2 中查找匹配的行。
In this case USING INDEX is equivalent to using the PK when the PK is involved.在这种情况下， USING INDEX相当于在涉及 PK 时使用 PK。 ( EXPLAIN is a bit sloppy and inconsistent in this area.) （ EXPLAIN在这方面有点草率和不一致。）
Sometimes you can get more details with EXPLAIN FORMAT=JSON SELECT... .有时您可以使用EXPLAIN FORMAT=JSON SELECT...获得更多详细信息。 (Might not be anything useful in this case.) （在这种情况下可能没有任何用处。）
"rows" is just an estimate. “行”只是一个估计。
The non-indexed case reads t2 entirely into memory and builds a Hash index on it.非索引案例将 t2 完全读入 memory 并在其上构建 Hash 索引。 With too small a value for join_buffer_size , you can experience the alternative -- repeated full table scans of t2.如果join_buffer_size的值太小，您可以体验另一种选择——重复对 t2 进行全表扫描。
Your experiment is a good example of when the "join buffer" is good, but not as good as an appropriate index.您的实验是一个很好的例子，说明“连接缓冲区”何时很好，但不如适当的索引好。
Your experiment would probably come out the same with two separate tables instead of a "self-join".您的实验可能会使用两个单独的表而不是“自联接”来得出相同的结果。
"3 times as fast" -- I would expect a lot of variation in the "3" for different test cases. “快 3 倍”——对于不同的测试用例，我预计“3”会有很多变化。
For more on join_buffer_size , BNL, and BKA (Block Nested-Loop or Batched Key Access), see https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size有关join_buffer_size 、BNL 和 BKA（阻止嵌套循环或批量密钥访问）的更多信息，请参阅https://dev.mysql.com/doc/refman/8.0/en/server-system-size_variables.html#sysvar_join
It is potentially unsafe to set join_buffer_size bigger than 1% of RAM.将join_buffer_size设置为大于 RAM 的 1% 可能是不安全的。

MySQL 在没有主键的情况下将表加入自身

问题描述

1 个解决方案

解决方案1
0 2021-11-25 05:41:58

MySQL 在没有主键的情况下将表加入自身

问题描述

1 个解决方案

解决方案1 0 2021-11-25 05:41:58

解决方案1
0 2021-11-25 05:41:58