简体   繁体   English

MySQL 8.0 升级后特定查询性能不佳

[英]Poor Performance on Specific Queries after MySQL 8.0 Upgrade

EDIT: I am seeing the same behavior in Python as PHP. Seems to be something with MySQL.编辑:我在 Python 中看到与 PHP 相同的行为。似乎与 MySQL 是一样的。

We are trying to upgrade from MySQL 5.7 to 8.0.我们正在尝试从 MySQL 5.7 升级到 8.0。 Our codebase uses PHP MySQLi for queries to our MySQL server.我们的代码库使用 PHP MySQLi 来查询我们的 MySQL 服务器。 In our test setups, we are seeing poorer performance (50x slower) on certain queries that bind lots of parameters.在我们的测试设置中,我们发现某些绑定大量参数的查询的性能较差(慢 50 倍)。 We want to see MySQL 8.0 run in similar time as 5.7.我们希望看到 MySQL 8.0 的运行时间与 5.7 相似。 Below is the example table structure and trouble query.下面是示例表结构和故障查询。

CREATE TABLE IF NOT EXISTS `a` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `name` (`name`) USING BTREE,
  KEY `name_id` (`id`,`name`) USING BTREE
);

CREATE TABLE IF NOT EXISTS `b` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a_id` int NOT NULL,
  `value` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uniquevalue` (`a_id`,`value`) USING BTREE,
  KEY `a_id` (`a_id`) USING BTREE,
  KEY `v` (`value`) USING BTREE,
  CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);

CREATE TABLE IF NOT EXISTS `c` (
  `product` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `b_id` int NOT NULL,
  PRIMARY KEY (`product`,`b_id`) USING BTREE,
  KEY `b_id` (`b_id`),
  KEY `product` (`product`),
  CONSTRAINT `c_ibfk_2` FOREIGN KEY (`b_id`) REFERENCES `b` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
-- example trouble query
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
-- this hash is from the dataset (linked below) but it should match a record in the 'a' table that has an associated record in the 'b' table that in turn has an associated record in the 'c' table
INNER JOIN c on c.b_id = b.id and c.product IN (?, ?, ?...) -- ... meaning dynamic number of parameters

If the query is modified to only return one record (limit 1), the query is still slow.如果将查询修改为只返回一条记录(限制为 1),查询仍然很慢。 So it isn't about the volume of data being returned.所以这与返回的数据量无关。 If the query is ran non-parameterized (with string concatenation), query run time is acceptable in all environments.如果查询是非参数化运行的(使用字符串连接),则查询运行时间在所有环境中都是可接受的。 The more parameters you add, the slower the query gets (linear).添加的参数越多,查询越慢(线性)。 With 7,000 bound parameters, the query runs in 100 - 150 ms in MySQL 5.7 and ~10 seconds in MySQL 8.0.28.使用 7,000 个绑定参数,查询在 MySQL 5.7 中运行 100 - 150 毫秒,在 MySQL 8.0.28 中运行约 10 秒。 We see the same results in PHP 7.4 and 8.0.我们在 PHP 7.4 和 8.0 中看到相同的结果。 We see the same results with MySQLi or PDO.我们看到与 MySQLi 或 PDO 相同的结果。

This tells me that it is something to do with parameter binding.这告诉我这与参数绑定有关。 I enabled profiling and checked the results for the query.我启用了分析并检查了查询的结果。 The bulk of the query's time (~95%) was spent in the execution step, not the parameter binding step.查询的大部分时间 (~95%) 花在了执行步骤,而不是参数绑定步骤。 Also, I see mysql 8 process CPU is pegged while query is running.另外,我看到 mysql 8 process CPU is pegged while query is running。 I'm pretty stumped on this one.我对这个很困惑。

Here is the explain for MySQL 8.0.这是 MySQL 8.0 的解释。

id ID select_type选择类型 table桌子 partitions分区 type类型 possible_keys可能的键 key钥匙 key_len密钥长度 ref参考 rows filtered过滤 Extra额外的
1 1个 SIMPLE简单的 a一种 const常数 PRIMARY,name,name_id主,名称,name_id name姓名 1022 1022 const常数 1 1个 100 100 Using index使用索引
1 1个 SIMPLE简单的 c c ref参考 PRIMARY,b_id,product主要,b_id,产品 product产品 152 152 const常数 1 1个 100 100 Using index使用索引
1 1个 SIMPLE简单的 b b eq_ref eq_ref PRIMARY,uniquevalue,a_id主,唯一值,a_id PRIMARY基本的 4 4个 DefaultWeb.c.b_id DefaultWeb.c.b_id 1 1个 5 5个 Using where在哪里使用

Here is the explain for MySQL 5.7.这是 MySQL 5.7 的解释。

id ID select_type选择类型 table桌子 partitions分区 type类型 possible_keys可能的键 key钥匙 key_len密钥长度 ref参考 rows filtered过滤 Extra额外的
1 1个 SIMPLE简单的 a一种 const常数 PRIMARY,name,name_id主,名称,name_id name姓名 257 257 const常数 1 1个 100 100 Using index使用索引
1 1个 SIMPLE简单的 c c ref参考 PRIMARY,b_id,product主要,b_id,产品 PRIMARY基本的 152 152 const常数 1 1个 100 100 Using index使用索引
1 1个 SIMPLE简单的 b b eq_ref eq_ref PRIMARY,uniquevalue,a_id主,唯一值,a_id PRIMARY基本的 4 4个 DefaultWeb.c.b_id DefaultWeb.c.b_id 1 1个 5 5个 Using where在哪里使用

There are some differences between these two explains, but once again this problem only occurs with prepared statements within PHP.这两个解释之间存在一些差异,但此问题再次只发生在 PHP 内的准备语句中。

Below is some php code demonstrating the problem.下面是一些演示问题的 php 代码。 This code is written to work against the dataset I've provided in the Google Drive link below.编写此代码是为了处理我在下面的 Google 云端硬盘链接中提供的数据集。 I've also included our MySQL variables in a CSV.我还在 CSV 中包含了我们的 MySQL 变量。

<?php
// Modify these to fit your DB connection.
const HOST = '127.0.0.1';
const USER = 'root';
const PASS = 'localtest';
const DB_NAME = 'TestDatabase';

// As the number of parameters increases, time increases linearly.
// We're seeing ~10 seconds with 7000 params with this data.
const NUM_PARAMS = 7000;

function rand_string($length = 10) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

function sql_question_marks($count, $sets = 1) {
    return substr(str_repeat(",(".substr(str_repeat(",?", $count), 1).")", $sets), 1);
}

function unsecure_concat($params) {
    return "('" . implode("','", $params) . "')";
}

$params = [];
$param_types = '';
for ($i = 0; $i < NUM_PARAMS; $i++) {
    $params[] = rand_string();
    $param_types .= 's';
}

$big_query = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . sql_question_marks(count($params));

$non_parameterized = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . unsecure_concat($params);

$connection = new mysqli(HOST, USER, PASS, DB_NAME);

$q = $connection->prepare($big_query);
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo 'The total time for parameterized query is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results.

$q = $connection->prepare($big_query . ' LIMIT 1');
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one also shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for parameterized query with limit 1 is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results 

$q = $connection->prepare($non_parameterized);
$start_time = hrtime(true);
$q->execute(); // Same execution time in 5.7 and 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for non-parameterized query is ' . $total_time . ' seconds.';

You can download example data here: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view?usp=sharing您可以在此处下载示例数据: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view?usp=sharing

EDIT: Here is the JSON explain with 7,000 bound parameters.编辑:这是带有 7,000 个绑定参数的 JSON 解释。

{
    "EXPLAIN": {
        "query_block": {
            "select_id": 1,
            "cost_info": {
                "query_cost": "456.60"
            },
            "nested_loop": [
                {
                    "table": {
                        "table_name": "a",
                        "access_type": "const",
                        "possible_keys": [
                            "PRIMARY",
                            "name",
                            "name_id"
                        ],
                        "key": "name",
                        "used_key_parts": [
                            "name"
                        ],
                        "key_length": "257",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 1,
                        "rows_produced_per_join": 1,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "0.00",
                            "eval_cost": "0.10",
                            "prefix_cost": "0.00",
                            "data_read_per_join": "264"
                        },
                        "used_columns": [
                            "id",
                            "name"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "b",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "uniquevalue",
                            "a_id"
                        ],
                        "key": "uniquevalue",
                        "used_key_parts": [
                            "a_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 87,
                        "rows_produced_per_join": 87,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "8.44",
                            "eval_cost": "8.70",
                            "prefix_cost": "17.14",
                            "data_read_per_join": "65K"
                        },
                        "used_columns": [
                            "id",
                            "a_id",
                            "value"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "c",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "b_id",
                            "product"
                        ],
                        "key": "b_id",
                        "used_key_parts": [
                            "b_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "TestDatabase.b.id"
                        ],
                        "rows_examined_per_scan": 35,
                        "rows_produced_per_join": 564,
                        "filtered": "18.28",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "130.53",
                            "eval_cost": "56.47",
                            "prefix_cost": "456.60",
                            "data_read_per_join": "88K"
                        },
                        "used_columns": [
                            "product",
                            "b_id"
                        ],
                        "attached_condition": "" // i've omitted the condition since it breaks the SO char limit, it contains 7,000 random character strings at 10 length each
                    }
                }
            ]
        }
    }
}

As another user previously mentioned, the default character set changes to utf8mb4 in MySQL 8. Since you are using explicit utf8 charset definitions on some of the columns in the query predicate comparisons that are having issues, have you considered trying simply to "set names utf8;"正如之前提到的另一位用户,MySQL 中的默认字符集更改为 utf8mb4 8。由于您在查询谓词比较中的某些列上使用显式 utf8 字符集定义,因此您是否考虑过简单地尝试“设置名称 utf8 ;" in PHP?在 PHP? With prepared statements, the coercibility of the parameters may be different than the coercibility of string literals.对于准备好的语句,参数的可强制性可能不同于字符串文字的可强制性。

b sounds like a key-value table, which is an inefficient anti-pattern. b听起来像一个键值表,这是一种低效的反模式。 But my point today is that 'normalizing' the name makes it worse.但我今天的观点是,“正常化”这个name会使情况变得更糟。 Is table c a many-to-many mapping table?c是多对多映射表吗?

So, get rid of table a and simply put the name in table b因此,摆脱表a并简单地将name放入表b

There are some redundant indexes that you should Drop.您应该删除一些冗余索引。

  • In a table with PRIMARY KEY(x) , there is essentially no need for INDEX(x, ...) .在具有PRIMARY KEY(x)的表中,基本上不需要INDEX(x, ...)
  • With INDEX(e, f), or UNIQUE(e, f) , there is no need to also have INDEX(e)`.使用INDEX(e, f), or UNIQUE(e, f) , there is no need to also have INDEX(e)`。

Key-Value核心价值

CREATE TABLE FooAttributes (
    foo_id INT UNSIGNED NOT NULL,  -- link to main table
    key VARCHAR(..) NOT NULL,
    val ..., 
    PRIMARY KEY(foo_id, key),
    INDEX(key)    -- if needed
    INDEX(val)    -- if needed
) ENGINE=InnoDB;

Notes:笔记:

  • Normalizing the "key" value slows processing down without providing much space savings.规范化“关键”值会减慢处理速度,但不会节省太多空间。
  • The PK is designed for rapid access to the desired rows, hence, PK 旨在快速访问所需的行,因此,
  • There no traditional id AUTO_INCREMENT .没有传统的id AUTO_INCREMENT
  • There is no clean solution when you need val to be either strings or numeric.当您需要val为字符串或数字时,没有干净的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM