简体   繁体   中英

MySQL subquery using RAND() with INNER JOIN returns no row

I have a list of object, and a list of places, and I want to randomly put an object in a place.

CREATE TABLE so_object (
    `id` INT,
    `name` TINYTEXT,
    PRIMARY KEY (`id`)
);
CREATE TABLE so_place (
    `id` INT,
    `name` TINYTEXT,
    PRIMARY KEY (`id`)
);

TRUNCATE TABLE `so_object`;
TRUNCATE TABLE `so_place`;

INSERT INTO `so_object` VALUES (1, 'banana'), (2, 'apple'), (3, 'chocolate'), (4, 'milk'), (5, 'phone');
INSERT INTO `so_place` VALUES (1, 'room'), (4, 'kitchen'), (7, 'living'), (8, 'cave');

Then, I pick the max id of the object table, and assign a random integer to the object.

SET @idMax := (SELECT MAX(id) FROM so_place);
SELECT @idMax;

SELECT
    FLOOR(RAND()*@idMax+1) AS id_place, id, name
FROM
    so_object
;

Then, I check that the assigned integer exists in the place

SELECT
    *
FROM (
    SELECT
        FLOOR(RAND()*@idMax+1) AS id_place, id, name
    FROM
        so_object
) AS t
INNER JOIN so_place AS p
    ON p.id = t.id_place
;

There are holes in the so_place table, so I do the INNER JOIN to ensure the place exists. I want that an object will not be put in any place (ie: if the random numbers it picked was in a so_place hole). I want that a place can be empty, or can contain two objects.

When you try on small tables like those two, then it looks to work fine. But the more places I add, the fewer rows returned:

INSERT INTO so_place VALUES
    (9, 'room 9'),
    (10, 'room 10'),
    (11, 'room 11'),
    (12, 'room 12'),
    (13, 'room 13'),
    (14, 'room 14'),
    (15, 'room 15'),
    (16, 'room 16'),
    (17, 'room 17'),
    (18, 'room 18'),
    (19, 'room 19'),
    (20, 'room 20'),
    (21, 'room 21'),
    (22, 'room 22'),
    (23, 'room 23'),
    (24, 'room 24'),
    (25, 'room 25'),
    (26, 'room 26'),
    (27, 'room 27'),
    (28, 'room 28'),
    (29, 'room 29'),
    (30, 'room 30'),
    (31, 'room 31'),
    (32, 'room 32'),
    (33, 'room 33'),
    (34, 'room 34'),
    (35, 'room 35'),
    (36, 'room 36'),
    (37, 'room 37'),
    (38, 'room 38'),
    (39, 'room 39'),
    (40, 'room 40'),
    (41, 'room 41'),
    (42, 'room 42'),
    (43, 'room 43'),
    (44, 'room 44'),
    (45, 'room 45'),
    (46, 'room 46'),
    (47, 'room 47'),
    (48, 'room 48'),
    (49, 'room 49'),
    (50, 'room 50'),
    (51, 'room 51'),
    (52, 'room 52'),
    (53, 'room 53'),
    (54, 'room 54'),
    (55, 'room 55'),
    (56, 'room 56'),
    (57, 'room 57'),
    (58, 'room 58'),
    (59, 'room 59');

Which makes no sense, since the so_place has no more holes. In fact, I suspect the MySQL engine to first parse the place table, then pick the random integer, and only keep the row if the random integer matches to place's id (which has less chances to be true as more places are added).

This query "worked fine" in the MySQL 5.6.25 and MySQL 5.5.24 (ie: MySQL parsed the nested table first, then does the inner join and only keep rows from inner table if they matches a place) but in MySQL 5.7.10, it no longer works.

I don't know whether it is a "MySQL 5.7.10 bug", or if it's an expected SQL's result (and then, previous versions were bugged and newer are "fixed"). I don't know how to get back the behavior of MySQL 5.5/5.6, so any query fix or other query meaning the same are welcome.


And after that night sleep, an EXPLAIN shows that MySQL does an intermediate simplification:*

id  select_type table   type    rows    filtered    Extra
1   SIMPLE  so_object   ALL     5       100.00      \N
1   SIMPLE  p           ALL     55      10.00       Using where; Using join buffer (Block Nested Loop)

The t tables does not appear. So how to force MySQL to do the intermediate table , since the query optimizer here optimizes way too much and break the query result?

Update: According to MySQL 5.7 Doc , the query optimizer now no longer materialize subqueries (generate the temp table I need here). So I could solve the issue by deactivating this behavior with SET optimizer_switch = 'derived_merge=off'; , but I dislike a bit doing so since I will need to reactivate this option after the query is executed.

I will answer to myself, even if that solution more looks like a hack. At least, the explanation is pretty clear:

As of MySQL 5.7.6, the optimizer handles derived tables and view references the same way: It avoids unnecessary materialization whenever possible. [...] Before MySQL 5.7.6, derived tables were always materialized

https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html#derived-table-optimization

Hence, the query was translated internally to "pick all places, for each object, pick the random number, and it matches place's id, then keep this place for this object". The more places, the less chances to have the match, hence the "no row or sometimes one". The EXPLAIN shows it pretty clear:

id  select_type table   type    rows    filtered    Extra
1   SIMPLE  so_object   ALL     5       100.00      \N
1   SIMPLE  p           ALL     55      10.00       Using where; Using join buffer (Block Nested Loop)

The subquery do not generate a temporary table (not materialized), while previous versions did.

The only way to force the subquery to be materialized (and so, to eval the RAND() only one time) is to make it DISTINCT :

Constructs that prevent merging are the same as those that prevent merging in views. Examples are SELECT DISTINCT or LIMIT in the subquery.

So the query is now

SET @idMax := (SELECT MAX(id) FROM so_place);
SELECT
    *
FROM (
    SELECT DISTINCT
        FLOOR(RAND()*@idMax+1) AS id_place, id, name
    FROM
        so_object
) AS t
INNER JOIN so_place AS p
    ON p.id = t.id_place
;

And it returns the list of objects where "almost each of them" is matched against one place, and where places can be matched to 0, 1 or more objects.

id_place    id  name        id  name
16          1   banana      16  room 16
25          3   chocolate   25  room 25
16          4   milk        31  room 16
22          5   phone       22  room 22

could be a casting problem try using cast as integer

SELECT t.*
FROM (
    SELECT
        cast(FLOOR(RAND()*@idMax+1) as UNSIGNED)  AS rnum, u.id
    FROM underground AS u
)  t
INNER JOIN integers AS i
ON i.n = t.rnum

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM