简体   繁体   English

处理数百万行时,PDO DELETE意外地变慢

[英]PDO DELETE unexpectedly slow when working with millions of rows

I'm working with a MYISAM table that has about 12 million rows. 我正在使用一个大约有1200万行的MYISAM表。 A method is used to delete all records older than a specified date. 方法用于删除早于指定日期的所有记录。 The table is indexed on the date field. 该表在日期字段上编制索引。 When run in-code, the log shows that this takes about 13 seconds when there are no records to delete and about 25 seconds when there are 1 day's records. 在代码中运行时,日志显示当没有要删除的记录时这需要大约13秒,而当有1天的记录时大约需要25秒。 When the same query is run in mysql client (taking the query from the SHOW PROCESSLIST when the code is running) it takes no time at all for no records, and about 16 seconds for a day's records. 当在mysql客户端中运行相同的查询时(在代码运行时从SHOW PROCESSLIST获取查询),它根本没有时间没有记录,一天的记录大约需要16秒。

The real-life problem is that this is taking a long time when there are records to delete when run once a day, so running it more often seems logical. 现实生活中的问题是,当每天运行一次时要删除记录需要花费很长时间,因此更频繁地运行它似乎是合乎逻辑的。 But I'd like it to exit as quick as possible when there is nothing to do. 但是当我无所事事时,我希望它能尽快退出。

Method extract: 方法提取:

    try {
        $smt = DB::getInstance()->getDbh()->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
        $smt->execute(array(':date' => $date));
        return true;
    } catch (\PDOException $e) {
        // Some logging here removed to ensure a clean test
    }

Log results when 0 rows for deletion: 删除0行时记录结果:

    [debug] ScriptController::actionDeleteHistory() success in 12.82 seconds

mysql client when 0 rows for deletion: mysql客户端当0行删除时:

    mysql> DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55';
    Query OK, 0 rows affected (0.00 sec)

Log results when 1 days results for deletion: 1天结果删除时记录结果:

    [debug] ScriptController::actionDeleteHistory() success in 25.48 seconds

mysql client when 1 days results for deletion: mysql客户端1天后删除结果:

    mysql> DELETE FROM user_history WHERE dateSent < '2013-05-05 13:41:55';
    Query OK, 672260 rows affected (15.70 sec)

Is there a reason why PDO is slower? PDO速度慢的原因是什么?

Cheers. 干杯。

Responses to comments: 回复评论:

It's the same query on both, so the index is either being picked up or it's not. 两者都是相同的查询,因此索引要么被拾取,要么不被接收。 And it is. 它是。

EXPLAIN SELECT * FROM user_history WHERE dateSent < '2013-05-05 13:41:55' 
1   SIMPLE  user_history range  date_sent   date_sent   4   NULL    4   Using where 

MySQL and Apache are running on the same server for the purposes of this test. 出于此测试的目的,MySQL和Apache在同一服务器上运行。 If you're getting at an issue of load, then mysql does hit 100% for the 13 seconds on the in-code query. 如果你遇到了一个加载问题,那么mysql在代码内查询的13秒内确实达到了100%。 On the mysql client query, it doesn't get chance to register on top before the query is complete. 在mysql客户端查询中,它在查询完成之前没有机会在顶部注册。 I can't see how this is not something that PHP/PDO is adding to the equation but I'm open to all ideas. 我看不出PHP / PDO是如何添加到等式中的,但我对所有想法持开放态度。

:date is a PDO placeholder, and the fieldname is dateSent so there is no conflict with mysql keywords. :date是PDO占位符,fieldname是dateSent,因此不会与mysql关键字冲突。 Still, using :dateSent instead still causes the delay. 仍然,使用:dateSent仍然会导致延迟。

Also already tried without using placeholders but neglected to mention this so good call, thanks! 也已经尝试过不使用占位符但忽略了提到这么好的电话,谢谢! Along the lines of this. 顺着这个。 Still the same delay with PHP/PDO. PHP / PDO的延迟仍然相同。

DB::getInstance()->getDbh()->query(DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55')

And using placeholders in mysql client still shows no delay: 在mysql客户端使用占位符仍然没有显示延迟:

PREPARE test from 'DELETE FROM user_history WHERE dateSent < ?';
SET @datesent='2013-05-05 13:41:55';
EXECUTE test USING @datesent;
Query OK, 0 rows affected (0.00 sec)

It's a MYISAM table so no transactions involved on this one. 这是一张MYISAM表,因此没有涉及此交易。

Value of $date differs to test for no deletions or one day's deletions, as shown in the query run on mysql client which is taken from SHOW PROCESSLIST while the code is running. $ date的值不同于测试没有删除或一天的删除,如在mysql客户端上运行的查询中所示,该代码运行时从SHOW PROCESSLIST获取。 In this case it is not passed to the method and is derived from: 在这种情况下,它不会传递给方法,而是派生自:

    if (!isset($date)) {
        $date = date("Y-m-d H:i:s", strtotime(sprintf("-%d days", self::DELETE_BEFORE)));
    }

And at this point the table schema may get called into question, so: 此时,表模式可能会受到质疑,因此:

CREATE TABLE IF NOT EXISTS `user_history` (
  `userId` int(11) NOT NULL,
  `asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`userId`,`asin`),
  KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

It's a decent sized website with lots of DB calls throughout. 它是一个体面的网站,有很多数据库调用。 I see no evidence in the way the site performs in any other respect that suggests it is down to dodgy routing. 我认为该网站在任何其他方面的表现方式都没有任何证据表明它可以归结为狡猾的路由。 Especially as I see this query on SHOW PROCESSLIST slowly creeping its way up to 13 seconds when run in PHP/PDO, but it takes no time at all when run in mysql (particularly referring to where no records are to be deleted which takes 13 seconds in PHP/PDO only). 特别是当我在SHOW PROCESSLIST上看到这个查询时,在PHP / PDO中运行时慢慢爬上13秒,但是在mysql中运行时根本不需要时间(特别是指没有记录需要删除的时间需要13秒)仅限PHP / PDO)。

Currently it is only this particular DELETE query that is in question. 目前只有这个特定的DELETE查询才有问题。 But I don't have another bulk DELETE statement like this anywhere else in this project, or any other project of mine that I can think of. 但是我在这个项目的其他任何地方都没有这样的批量DELETE语句,或者我能想到的任何其他项目。 So the question is particular to PDO DELETE queries on big-ish tables. 因此,问题特别针对大型表上的PDO DELETE查询。

"Isn't that your answer then?" “那不是你的答案吗?” - No. The question is why does this take significantly longer in PHP/PDO compared to mysql client. - 不。问题是为什么与mysql客户端相比,这在PHP / PDO中需要更长的时间。 The SHOW PROCESSLIST only shows this query taking time in PHP/PDO (for no records to be deleted). SHOW PROCESSLIST仅显示此查询在PHP / PDO中花费时间(不删除任何记录)。 It takes no time at all in mysql client. 它在mysql客户端中根本没有时间。 That's the point. 这才是重点。

Tried the PDO query without the try-catch block, and there is still a delay. 尝试没有try-catch块的PDO查询,仍然存在延迟。


And trying with mysql_* functions shows the same timings as with using the mysql client directly. 尝试使用mysql_ *函数显示与直接使用mysql客户端相同的时序。 So the finger is pointing quite strongly at PDO right now. 因此,手指现在非常强烈地指向PDO。 It could be my code that interfaces with PDO, but as no other queries have an unexpected delay, this seems less likely: 它可能是我的代码与PDO接口,但由于没有其他查询有意外的延迟,这似乎不太可能:

Method: 方法:

    $conn = mysql_connect(****);
    mysql_select_db(****);

    $query = "DELETE FROM " . static::$table . " WHERE dateSent < '$date'";
    $result = mysql_query($query);

Logs for no records to be deleted: 记录没有要删除的记录:

Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-03 15:12:54'
Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:12:54 [verbose] ScriptController::actionDeleteHistory() success in 0.01 seconds

Logs for one day's records to be deleted: 记录要删除的一天记录:

Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-07 15:14:08'
Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:14:24 [debug] ScriptController::apiReturn(): {"message":true}
Fri May 17 15:14:24 [verbose] ScriptController::actionDeleteHistory() success in 15.55 seconds

And tried again avoid calls to DB singleton by creating a PDO connection in the method and using that, and this has a delay once again. 并再次尝试通过在方法中创建PDO连接并使用它来避免对DB单例的调用,这又有一个延迟。 Though there are no other delays with other queries that all use the same DB singleton so worth a try, but didn't really expect any difference: 虽然其他查询都没有其他延迟,所有使用相同的DB单例都值得一试,但并没有真正期望有任何区别:

    $connectString = sprintf('mysql:host=%s;dbname=%s', '****', '****');
    $dbh = new \PDO($connectString, '****', '****');
    $dbh->exec("SET CHARACTER SET utf8");
    $dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);

    $smt = $dbh->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
    $smt->execute(array(':date' => $date));

Calling method with time logger: 使用时间记录器调用方法:

    $startTimer = microtime(true);
    $deleted = $this->apiReturn(array('message' => UserHistory::deleteBefore()));
    $timeEnd = microtime(true) - $startTimer;
    Logger::write(LOG_VERBOSE, "ScriptController::actionDeleteHistory() success in " . number_format($timeEnd, 2) . " seconds");

Added PDO/ATTR_EMULATE_PREPARES to DB::connect(). 将PDO / ATTR_EMULATE_PREPARES添加到DB :: connect()。 Still has the delay when deleting no records at all. 根本没有删除记录时仍有延迟。 I've not used this before but it looks like the right format: 我以前没用过这个,但它看起来像是正确的格式:

   $this->dbh->setAttribute(\PDO::ATTR_EMULATE_PREPARES, false);

Current DB::connect(), though if there were general issues with this, surely it would affect all queries? 当前的DB :: connect()虽然如果存在这方面的一般问题,肯定会影响所有查询?

public function connect($host, $user, $pass, $name)
{
    $connectString = sprintf('mysql:host=%s;dbname=%s', $host, $name);
    $this->dbh = new \PDO($connectString, $user, $pass);
    $this->dbh->exec("SET CHARACTER SET utf8");
    $this->dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
 }

The indexes are shown above in the schema. 索引显示在架构的上方。 If it was directly related to rebuilding the indexes after the deletion of the record, then mysql would take the same time as PHP/PDO. 如果它与删除记录后重建索引直接相关,那么mysql将花费与PHP / PDO相同的时间。 It doesn't. 它没有。 This is the issue. 这是问题所在。 It's not that this query is slow - it's expected to take some time. 并不是说这个查询很慢 - 预计需要一些时间。 It's that PHP/PDO is noticeably slower than queries executed in the mysql client or queries that use the mysql lib in PHP. 这是PHP / PDO明显慢于在mysql客户端中执行的查询或在PHP中使用mysql lib的查询。


MYSQL_ATTR_USE_BUFFERED_QUERY tried, but still a delay 尝试了MYSQL_ATTR_USE_BUFFERED_QUERY,但仍有延迟


DB is a standard singleton pattern. DB是标准的单例模式。 DB::getInstance()->getDbh() returns the PDO connection object created in the DB::connect() method shown above, eg: DB::dbh. DB :: getInstance() - > getDbh()返回在上面显示的DB :: connect()方法中创建的PDO连接对象,例如:DB :: dbh。 I believe I've proved that the DB singleton is not an issue as there is still a delay when creating the PDO connection in the same method as the query is executed (6 edits above). 我相信我已经证明了DB单例不是问题,因为在执行查询的同一方法中创建PDO连接时仍有延迟(上面的6个编辑)。


I've found what it causing, but I don't know why this is happening right this minute. 我发现了它造成了什么,但我不知道为什么这一刻正好发生。

I've created a test SQL that creates a table with 10 million random rows in the right format, and a PHP script that runs the offending query. 我创建了一个测试SQL,它创建了一个包含1000万个正确格式的随机行的表,以及一个运行违规查询的PHP脚本。 And it takes no time at all in PHP/PDO or mysql client. 在PHP / PDO或mysql客户端中它根本不需要时间。 Then I change the DB collation from the default latin1_swedish_ci to utf8_unicode_ci and it takes 10 seconds in PHP/PDO and no time at all in mysql client. 然后我将数据库排序规则从默认的latin1_swedish_ci更改为utf8_unicode_ci,在PHP / PDO中需要10秒,在mysql客户端中根本没有时间。 Then I change it back to latin1_swedish_ci and it takes no time at all in PHP/PDO again. 然后我将它改回latin1_swedish_ci,它再次在PHP / PDO中没有时间。

Tada! 田田!

Now if I remove this from the DB connection, it works fine in either collation. 现在,如果我从数据库连接中删除它,它在任何排序规则中都可以正常工作。 So there is some sort of problem here: 所以这里有一些问题:

 $dbh->exec("SET CHARACTER SET utf8");

I shall research more, then follow up later. 我会研究更多,然后再跟进。

So... 所以...

This post explains where the flaw was. 这篇文章解释了这个漏洞的位置。

Is "SET CHARACTER SET utf8" necessary? 是否需要“SET CHARACTER SET utf8”?

Essentially, it was the use of: 基本上,它是使用:

$this->dbh->exec("SET CHARACTER SET utf8");

which should have been this in DB::connect() 应该是这个在DB :: connect()

$this->dbh->exec("SET NAMES utf8");

My fault entirely. 我的错完全。

It seems to have had dire effects because of a need on the part of the mysql server to convert the query to match the collation of the DB. 它似乎有可怕的效果,因为mysql服务器需要转换查询以匹配数据库的排序规则。 The above post gives much better details than I can. 上面的帖子提供了比我更好的细节。

If anyone has the need to confirm my findings, this series of SQL queries will setup a test DB and allow you to check for yourself. 如果有人需要确认我的发现,这一系列的SQL查询将设置一个测试数据库并允许您自己检查。 Just make sure that the indexes are correctly enabled after the test data has been entered because I had to drop and re-add these for some reason. 只需确保在输入测试数据后正确启用索引,因为由于某种原因我必须删除并重新添加这些索引。 It creates 10 million rows. 它创造了1000万行。 Maybe less will be enough to prove the point. 也许少就足以证明这一点。

DROP DATABASE IF EXISTS pdo_test;
CREATE DATABASE IF NOT EXISTS pdo_test;
USE pdo_test;

CREATE TABLE IF NOT EXISTS test (
  `userId` int(11) NOT NULL,
  `asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`userId`,`asin`),
  KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

drop procedure if exists load_test_data;

delimiter #
create procedure load_test_data()
    begin
        declare v_max int unsigned default 10000000;
        declare v_counter int unsigned default 0;

        while v_counter < v_max do
            INSERT INTO test (userId, asin, dateSent) VALUES (FLOOR(1 + RAND()*10000000), SUBSTRING(MD5(RAND()) FROM 1 FOR 10), NOW());
            set v_counter=v_counter+1;
        end while;
    end #
delimiter ;

ALTER TABLE test DISABLE KEYS;
call load_test_data();
ALTER TABLE test ENABLE KEYS;

# Tests - reconnect to mysql client after each one to reset previous CHARACTER SET

# Right collation, wrong charset - slow
SET CHARACTER SET utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

# Wrong collation, no charset - fast
ALTER DATABASE pdo_test COLLATE='latin1_swedish_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

# Right collation, right charset - fast
SET NAMES utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM