简体   繁体   English

MySQL 在没有主键的情况下将表加入自身

[英]MySQL JOINing a TABLE to itself without a primary key

I created two tables in mysql.我在 mysql 中创建了两个表。 Each contains an integer idx and a string name.每个都包含一个 integer idx 和一个字符串名称。 In one, the idx was the primary key.一方面,idx 是主键。

CREATE TABLE table_indexed (
    idx     INTEGER,
    name    VARCHAR(24), 
    PRIMARY KEY(idx)
);
CREATE TABLE table_not_indexed (
    idx     INTEGER,
    name    VARCHAR(24)
);

I then added the same data to both tables.然后我将相同的数据添加到两个表中。 3 million lines of distinct values to idx (1-3_000_00, randomly arranged) and 3 million random arrangements of 8 lowercase characters to name. 300 万行不同的值到 idx(1-3_000_00,随机排列)和 300 万行 8 个小写字符的随机排列来命名。

Then I ran a query where I joined each table to itself.然后我运行了一个查询,将每个表连接到自身。 The table without the primary key runs almost 3 times as fast.没有主键的表运行速度几乎快了 3 倍。

mysql> SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (11.80 sec)

mysql> SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (4.12 sec)

EDIT: Asked mySQL to Explain the query.编辑:要求 mySQL 解释查询。

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                      | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL                     | 3171970 |   100.00 | Using index |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | index_test3000000.t1.idx |       1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>
  • In both cases it does a table scan of t1, then looks for the matching row in t2.在这两种情况下,它都会对 t1 进行表扫描,然后在 t2 中查找匹配的行。
  • In this case USING INDEX is equivalent to using the PK when the PK is involved.在这种情况下, USING INDEX相当于在涉及 PK 时使用 PK。 ( EXPLAIN is a bit sloppy and inconsistent in this area.) EXPLAIN在这方面有点草率和不一致。)
  • Sometimes you can get more details with EXPLAIN FORMAT=JSON SELECT... .有时您可以使用EXPLAIN FORMAT=JSON SELECT...获得更多详细信息。 (Might not be anything useful in this case.) (在这种情况下可能没有任何用处。)
  • "rows" is just an estimate. “行”只是一个估计。
  • The non-indexed case reads t2 entirely into memory and builds a Hash index on it.非索引案例将 t2 完全读入 memory 并在其上构建 Hash 索引。 With too small a value for join_buffer_size , you can experience the alternative -- repeated full table scans of t2.如果join_buffer_size的值太小,您可以体验另一种选择——重复对 t2 进行全表扫描。
  • Your experiment is a good example of when the "join buffer" is good, but not as good as an appropriate index.您的实验是一个很好的例子,说明“连接缓冲区”何时很好,但不如适当的索引好。
  • Your experiment would probably come out the same with two separate tables instead of a "self-join".您的实验可能会使用两个单独的表而不是“自联接”来得出相同的结果。
  • "3 times as fast" -- I would expect a lot of variation in the "3" for different test cases. “快 3 倍”——对于不同的测试用例,我预计“3”会有很多变化。
  • For more on join_buffer_size , BNL, and BKA (Block Nested-Loop or Batched Key Access), see https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size有关join_buffer_size 、BNL 和 BKA(阻止嵌套循环或批量密钥访问)的更多信息,请参阅https://dev.mysql.com/doc/refman/8.0/en/server-system-size_variables.html#sysvar_join
  • It is potentially unsafe to set join_buffer_size bigger than 1% of RAM.join_buffer_size设置为大于 RAM 的 1% 可能是不安全的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM