简体   繁体   中英

Poor Performance on Specific Queries after MySQL 8.0 Upgrade

EDIT: I am seeing the same behavior in Python as PHP. Seems to be something with MySQL.

We are trying to upgrade from MySQL 5.7 to 8.0. Our codebase uses PHP MySQLi for queries to our MySQL server. In our test setups, we are seeing poorer performance (50x slower) on certain queries that bind lots of parameters. We want to see MySQL 8.0 run in similar time as 5.7. Below is the example table structure and trouble query.

CREATE TABLE IF NOT EXISTS `a` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `name` (`name`) USING BTREE,
  KEY `name_id` (`id`,`name`) USING BTREE
);

CREATE TABLE IF NOT EXISTS `b` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a_id` int NOT NULL,
  `value` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uniquevalue` (`a_id`,`value`) USING BTREE,
  KEY `a_id` (`a_id`) USING BTREE,
  KEY `v` (`value`) USING BTREE,
  CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);

CREATE TABLE IF NOT EXISTS `c` (
  `product` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `b_id` int NOT NULL,
  PRIMARY KEY (`product`,`b_id`) USING BTREE,
  KEY `b_id` (`b_id`),
  KEY `product` (`product`),
  CONSTRAINT `c_ibfk_2` FOREIGN KEY (`b_id`) REFERENCES `b` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
-- example trouble query
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
-- this hash is from the dataset (linked below) but it should match a record in the 'a' table that has an associated record in the 'b' table that in turn has an associated record in the 'c' table
INNER JOIN c on c.b_id = b.id and c.product IN (?, ?, ?...) -- ... meaning dynamic number of parameters

If the query is modified to only return one record (limit 1), the query is still slow. So it isn't about the volume of data being returned. If the query is ran non-parameterized (with string concatenation), query run time is acceptable in all environments. The more parameters you add, the slower the query gets (linear). With 7,000 bound parameters, the query runs in 100 - 150 ms in MySQL 5.7 and ~10 seconds in MySQL 8.0.28. We see the same results in PHP 7.4 and 8.0. We see the same results with MySQLi or PDO.

This tells me that it is something to do with parameter binding. I enabled profiling and checked the results for the query. The bulk of the query's time (~95%) was spent in the execution step, not the parameter binding step. Also, I see mysql 8 process CPU is pegged while query is running. I'm pretty stumped on this one.

Here is the explain for MySQL 8.0.

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a const PRIMARY,name,name_id name 1022 const 1 100 Using index
1 SIMPLE c ref PRIMARY,b_id,product product 152 const 1 100 Using index
1 SIMPLE b eq_ref PRIMARY,uniquevalue,a_id PRIMARY 4 DefaultWeb.c.b_id 1 5 Using where

Here is the explain for MySQL 5.7.

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a const PRIMARY,name,name_id name 257 const 1 100 Using index
1 SIMPLE c ref PRIMARY,b_id,product PRIMARY 152 const 1 100 Using index
1 SIMPLE b eq_ref PRIMARY,uniquevalue,a_id PRIMARY 4 DefaultWeb.c.b_id 1 5 Using where

There are some differences between these two explains, but once again this problem only occurs with prepared statements within PHP.

Below is some php code demonstrating the problem. This code is written to work against the dataset I've provided in the Google Drive link below. I've also included our MySQL variables in a CSV.

<?php
// Modify these to fit your DB connection.
const HOST = '127.0.0.1';
const USER = 'root';
const PASS = 'localtest';
const DB_NAME = 'TestDatabase';

// As the number of parameters increases, time increases linearly.
// We're seeing ~10 seconds with 7000 params with this data.
const NUM_PARAMS = 7000;

function rand_string($length = 10) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

function sql_question_marks($count, $sets = 1) {
    return substr(str_repeat(",(".substr(str_repeat(",?", $count), 1).")", $sets), 1);
}

function unsecure_concat($params) {
    return "('" . implode("','", $params) . "')";
}

$params = [];
$param_types = '';
for ($i = 0; $i < NUM_PARAMS; $i++) {
    $params[] = rand_string();
    $param_types .= 's';
}

$big_query = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . sql_question_marks(count($params));

$non_parameterized = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . unsecure_concat($params);

$connection = new mysqli(HOST, USER, PASS, DB_NAME);

$q = $connection->prepare($big_query);
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo 'The total time for parameterized query is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results.

$q = $connection->prepare($big_query . ' LIMIT 1');
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one also shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for parameterized query with limit 1 is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results 

$q = $connection->prepare($non_parameterized);
$start_time = hrtime(true);
$q->execute(); // Same execution time in 5.7 and 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for non-parameterized query is ' . $total_time . ' seconds.';

You can download example data here: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view?usp=sharing

EDIT: Here is the JSON explain with 7,000 bound parameters.

{
    "EXPLAIN": {
        "query_block": {
            "select_id": 1,
            "cost_info": {
                "query_cost": "456.60"
            },
            "nested_loop": [
                {
                    "table": {
                        "table_name": "a",
                        "access_type": "const",
                        "possible_keys": [
                            "PRIMARY",
                            "name",
                            "name_id"
                        ],
                        "key": "name",
                        "used_key_parts": [
                            "name"
                        ],
                        "key_length": "257",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 1,
                        "rows_produced_per_join": 1,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "0.00",
                            "eval_cost": "0.10",
                            "prefix_cost": "0.00",
                            "data_read_per_join": "264"
                        },
                        "used_columns": [
                            "id",
                            "name"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "b",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "uniquevalue",
                            "a_id"
                        ],
                        "key": "uniquevalue",
                        "used_key_parts": [
                            "a_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 87,
                        "rows_produced_per_join": 87,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "8.44",
                            "eval_cost": "8.70",
                            "prefix_cost": "17.14",
                            "data_read_per_join": "65K"
                        },
                        "used_columns": [
                            "id",
                            "a_id",
                            "value"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "c",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "b_id",
                            "product"
                        ],
                        "key": "b_id",
                        "used_key_parts": [
                            "b_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "TestDatabase.b.id"
                        ],
                        "rows_examined_per_scan": 35,
                        "rows_produced_per_join": 564,
                        "filtered": "18.28",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "130.53",
                            "eval_cost": "56.47",
                            "prefix_cost": "456.60",
                            "data_read_per_join": "88K"
                        },
                        "used_columns": [
                            "product",
                            "b_id"
                        ],
                        "attached_condition": "" // i've omitted the condition since it breaks the SO char limit, it contains 7,000 random character strings at 10 length each
                    }
                }
            ]
        }
    }
}

As another user previously mentioned, the default character set changes to utf8mb4 in MySQL 8. Since you are using explicit utf8 charset definitions on some of the columns in the query predicate comparisons that are having issues, have you considered trying simply to "set names utf8;" in PHP? With prepared statements, the coercibility of the parameters may be different than the coercibility of string literals.

b sounds like a key-value table, which is an inefficient anti-pattern. But my point today is that 'normalizing' the name makes it worse. Is table c a many-to-many mapping table?

So, get rid of table a and simply put the name in table b

There are some redundant indexes that you should Drop.

  • In a table with PRIMARY KEY(x) , there is essentially no need for INDEX(x, ...) .
  • With INDEX(e, f), or UNIQUE(e, f) , there is no need to also have INDEX(e)`.

Key-Value

CREATE TABLE FooAttributes (
    foo_id INT UNSIGNED NOT NULL,  -- link to main table
    key VARCHAR(..) NOT NULL,
    val ..., 
    PRIMARY KEY(foo_id, key),
    INDEX(key)    -- if needed
    INDEX(val)    -- if needed
) ENGINE=InnoDB;

Notes:

  • Normalizing the "key" value slows processing down without providing much space savings.
  • The PK is designed for rapid access to the desired rows, hence,
  • There no traditional id AUTO_INCREMENT .
  • There is no clean solution when you need val to be either strings or numeric.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM