EDIT: I am seeing the same behavior in Python as PHP. Seems to be something with MySQL.
We are trying to upgrade from MySQL 5.7 to 8.0. Our codebase uses PHP MySQLi for queries to our MySQL server. In our test setups, we are seeing poorer performance (50x slower) on certain queries that bind lots of parameters. We want to see MySQL 8.0 run in similar time as 5.7. Below is the example table structure and trouble query.
CREATE TABLE IF NOT EXISTS `a` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `name` (`name`) USING BTREE,
KEY `name_id` (`id`,`name`) USING BTREE
);
CREATE TABLE IF NOT EXISTS `b` (
`id` int NOT NULL AUTO_INCREMENT,
`a_id` int NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `uniquevalue` (`a_id`,`value`) USING BTREE,
KEY `a_id` (`a_id`) USING BTREE,
KEY `v` (`value`) USING BTREE,
CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
CREATE TABLE IF NOT EXISTS `c` (
`product` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`b_id` int NOT NULL,
PRIMARY KEY (`product`,`b_id`) USING BTREE,
KEY `b_id` (`b_id`),
KEY `product` (`product`),
CONSTRAINT `c_ibfk_2` FOREIGN KEY (`b_id`) REFERENCES `b` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
-- example trouble query
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
-- this hash is from the dataset (linked below) but it should match a record in the 'a' table that has an associated record in the 'b' table that in turn has an associated record in the 'c' table
INNER JOIN c on c.b_id = b.id and c.product IN (?, ?, ?...) -- ... meaning dynamic number of parameters
If the query is modified to only return one record (limit 1), the query is still slow. So it isn't about the volume of data being returned. If the query is ran non-parameterized (with string concatenation), query run time is acceptable in all environments. The more parameters you add, the slower the query gets (linear). With 7,000 bound parameters, the query runs in 100 - 150 ms in MySQL 5.7 and ~10 seconds in MySQL 8.0.28. We see the same results in PHP 7.4 and 8.0. We see the same results with MySQLi or PDO.
This tells me that it is something to do with parameter binding. I enabled profiling and checked the results for the query. The bulk of the query's time (~95%) was spent in the execution step, not the parameter binding step. Also, I see mysql 8 process CPU is pegged while query is running. I'm pretty stumped on this one.
Here is the explain for MySQL 8.0.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | const | PRIMARY,name,name_id | name | 1022 | const | 1 | 100 | Using index | |
1 | SIMPLE | c | ref | PRIMARY,b_id,product | product | 152 | const | 1 | 100 | Using index | |
1 | SIMPLE | b | eq_ref | PRIMARY,uniquevalue,a_id | PRIMARY | 4 | DefaultWeb.c.b_id | 1 | 5 | Using where |
Here is the explain for MySQL 5.7.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | const | PRIMARY,name,name_id | name | 257 | const | 1 | 100 | Using index | |
1 | SIMPLE | c | ref | PRIMARY,b_id,product | PRIMARY | 152 | const | 1 | 100 | Using index | |
1 | SIMPLE | b | eq_ref | PRIMARY,uniquevalue,a_id | PRIMARY | 4 | DefaultWeb.c.b_id | 1 | 5 | Using where |
There are some differences between these two explains, but once again this problem only occurs with prepared statements within PHP.
Below is some php code demonstrating the problem. This code is written to work against the dataset I've provided in the Google Drive link below. I've also included our MySQL variables in a CSV.
<?php
// Modify these to fit your DB connection.
const HOST = '127.0.0.1';
const USER = 'root';
const PASS = 'localtest';
const DB_NAME = 'TestDatabase';
// As the number of parameters increases, time increases linearly.
// We're seeing ~10 seconds with 7000 params with this data.
const NUM_PARAMS = 7000;
function rand_string($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
function sql_question_marks($count, $sets = 1) {
return substr(str_repeat(",(".substr(str_repeat(",?", $count), 1).")", $sets), 1);
}
function unsecure_concat($params) {
return "('" . implode("','", $params) . "')";
}
$params = [];
$param_types = '';
for ($i = 0; $i < NUM_PARAMS; $i++) {
$params[] = rand_string();
$param_types .= 's';
}
$big_query = <<<SQL
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
INNER JOIN c on c.b_id = b.id and c.product IN
SQL . sql_question_marks(count($params));
$non_parameterized = <<<SQL
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
INNER JOIN c on c.b_id = b.id and c.product IN
SQL . unsecure_concat($params);
$connection = new mysqli(HOST, USER, PASS, DB_NAME);
$q = $connection->prepare($big_query);
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo 'The total time for parameterized query is ' . $total_time . ' seconds.';
$q->get_result(); // not concerned with results.
$q = $connection->prepare($big_query . ' LIMIT 1');
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one also shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo '<br>The total time for parameterized query with limit 1 is ' . $total_time . ' seconds.';
$q->get_result(); // not concerned with results
$q = $connection->prepare($non_parameterized);
$start_time = hrtime(true);
$q->execute(); // Same execution time in 5.7 and 8.0.
$end_time = hrtime(true);
$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds
echo '<br>The total time for non-parameterized query is ' . $total_time . ' seconds.';
You can download example data here: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view?usp=sharing
EDIT: Here is the JSON explain with 7,000 bound parameters.
{
"EXPLAIN": {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "456.60"
},
"nested_loop": [
{
"table": {
"table_name": "a",
"access_type": "const",
"possible_keys": [
"PRIMARY",
"name",
"name_id"
],
"key": "name",
"used_key_parts": [
"name"
],
"key_length": "257",
"ref": [
"const"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 1,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "0.00",
"eval_cost": "0.10",
"prefix_cost": "0.00",
"data_read_per_join": "264"
},
"used_columns": [
"id",
"name"
]
}
},
{
"table": {
"table_name": "b",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"uniquevalue",
"a_id"
],
"key": "uniquevalue",
"used_key_parts": [
"a_id"
],
"key_length": "4",
"ref": [
"const"
],
"rows_examined_per_scan": 87,
"rows_produced_per_join": 87,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "8.44",
"eval_cost": "8.70",
"prefix_cost": "17.14",
"data_read_per_join": "65K"
},
"used_columns": [
"id",
"a_id",
"value"
]
}
},
{
"table": {
"table_name": "c",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"b_id",
"product"
],
"key": "b_id",
"used_key_parts": [
"b_id"
],
"key_length": "4",
"ref": [
"TestDatabase.b.id"
],
"rows_examined_per_scan": 35,
"rows_produced_per_join": 564,
"filtered": "18.28",
"using_index": true,
"cost_info": {
"read_cost": "130.53",
"eval_cost": "56.47",
"prefix_cost": "456.60",
"data_read_per_join": "88K"
},
"used_columns": [
"product",
"b_id"
],
"attached_condition": "" // i've omitted the condition since it breaks the SO char limit, it contains 7,000 random character strings at 10 length each
}
}
]
}
}
}
As another user previously mentioned, the default character set changes to utf8mb4 in MySQL 8. Since you are using explicit utf8 charset definitions on some of the columns in the query predicate comparisons that are having issues, have you considered trying simply to "set names utf8;" in PHP? With prepared statements, the coercibility of the parameters may be different than the coercibility of string literals.
b
sounds like a key-value table, which is an inefficient anti-pattern. But my point today is that 'normalizing' the name
makes it worse. Is table c
a many-to-many mapping table?
So, get rid of table a
and simply put the name
in table b
There are some redundant indexes that you should Drop.
PRIMARY KEY(x)
, there is essentially no need for INDEX(x, ...)
.INDEX(e, f), or
UNIQUE(e, f) , there is no need to also have
INDEX(e)`.Key-Value
CREATE TABLE FooAttributes (
foo_id INT UNSIGNED NOT NULL, -- link to main table
key VARCHAR(..) NOT NULL,
val ...,
PRIMARY KEY(foo_id, key),
INDEX(key) -- if needed
INDEX(val) -- if needed
) ENGINE=InnoDB;
Notes:
id AUTO_INCREMENT
.val
to be either strings or numeric.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.