MySQL JOINing a TABLE to itself without a primary key

Question

I created two tables in mysql. Each contains an integer idx and a string name. In one, the idx was the primary key.

CREATE TABLE table_indexed (
    idx     INTEGER,
    name    VARCHAR(24), 
    PRIMARY KEY(idx)
);
CREATE TABLE table_not_indexed (
    idx     INTEGER,
    name    VARCHAR(24)
);

I then added the same data to both tables. 3 million lines of distinct values to idx (1-3_000_00, randomly arranged) and 3 million random arrangements of 8 lowercase characters to name.

Then I ran a query where I joined each table to itself. The table without the primary key runs almost 3 times as fast.

mysql> SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (11.80 sec)

mysql> SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
|  3000000 |
+----------+
1 row in set (4.12 sec)

EDIT: Asked mySQL to Explain the query.

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_indexed t1 JOIN table_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                      | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | index  | PRIMARY       | PRIMARY | 4       | NULL                     | 3171970 |   100.00 | Using index |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | index_test3000000.t1.idx |       1 |   100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*)
    -> FROM table_not_indexed t1 JOIN table_not_indexed t2
    -> ON t1.idx = t2.idx;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2993208 |    10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>

Answer 1

In both cases it does a table scan of t1, then looks for the matching row in t2.
In this case USING INDEX is equivalent to using the PK when the PK is involved. ( EXPLAIN is a bit sloppy and inconsistent in this area.)
Sometimes you can get more details with EXPLAIN FORMAT=JSON SELECT... . (Might not be anything useful in this case.)
"rows" is just an estimate.
The non-indexed case reads t2 entirely into memory and builds a Hash index on it. With too small a value for join_buffer_size , you can experience the alternative -- repeated full table scans of t2.
Your experiment is a good example of when the "join buffer" is good, but not as good as an appropriate index.
Your experiment would probably come out the same with two separate tables instead of a "self-join".
"3 times as fast" -- I would expect a lot of variation in the "3" for different test cases.
For more on join_buffer_size , BNL, and BKA (Block Nested-Loop or Batched Key Access), see https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size
It is potentially unsafe to set join_buffer_size bigger than 1% of RAM.

MySQL JOINing a TABLE to itself without a primary key

Question

1 answers

solution1
0 2021-11-25 05:41:58

MySQL JOINing a TABLE to itself without a primary key

Question

1 answers

solution1 0 2021-11-25 05:41:58

solution1
0 2021-11-25 05:41:58