简体   繁体   中英

MySQL JOINing a TABLE to itself without a primary key

I created two tables in mysql. Each contains an integer idx and a string name. In one, the idx was the primary key.

CREATE TABLE table_indexed (
    idx     INTEGER,
    name    VARCHAR(24), 
    PRIMARY KEY(idx)
);
CREATE TABLE table_not_indexed (
    idx     INTEGER,
    name    VARCHAR(24)
);

I then added the same data to both tables. 3 million lines of distinct values to idx (1-3_000_00, randomly arranged) and 3 million random arrangements of 8 lowercase characters to name.

Then I ran a query where I joined each table to itself. The table without the primary key runs almost 3 times as fast.

mysql> SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (11.80 sec)

mysql> SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (4.12 sec)

EDIT: Asked mySQL to Explain the query.

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                      | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL                     | 3171970 |   100.00 | Using index |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | index_test3000000.t1.idx |       1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>
  • In both cases it does a table scan of t1, then looks for the matching row in t2.
  • In this case USING INDEX is equivalent to using the PK when the PK is involved. ( EXPLAIN is a bit sloppy and inconsistent in this area.)
  • Sometimes you can get more details with EXPLAIN FORMAT=JSON SELECT... . (Might not be anything useful in this case.)
  • "rows" is just an estimate.
  • The non-indexed case reads t2 entirely into memory and builds a Hash index on it. With too small a value for join_buffer_size , you can experience the alternative -- repeated full table scans of t2.
  • Your experiment is a good example of when the "join buffer" is good, but not as good as an appropriate index.
  • Your experiment would probably come out the same with two separate tables instead of a "self-join".
  • "3 times as fast" -- I would expect a lot of variation in the "3" for different test cases.
  • For more on join_buffer_size , BNL, and BKA (Block Nested-Loop or Batched Key Access), see https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size
  • It is potentially unsafe to set join_buffer_size bigger than 1% of RAM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM