简体   繁体   中英

PDO DELETE unexpectedly slow when working with millions of rows

I'm working with a MYISAM table that has about 12 million rows. A method is used to delete all records older than a specified date. The table is indexed on the date field. When run in-code, the log shows that this takes about 13 seconds when there are no records to delete and about 25 seconds when there are 1 day's records. When the same query is run in mysql client (taking the query from the SHOW PROCESSLIST when the code is running) it takes no time at all for no records, and about 16 seconds for a day's records.

The real-life problem is that this is taking a long time when there are records to delete when run once a day, so running it more often seems logical. But I'd like it to exit as quick as possible when there is nothing to do.

Method extract:

    try {
        $smt = DB::getInstance()->getDbh()->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
        $smt->execute(array(':date' => $date));
        return true;
    } catch (\PDOException $e) {
        // Some logging here removed to ensure a clean test
    }

Log results when 0 rows for deletion:

    [debug] ScriptController::actionDeleteHistory() success in 12.82 seconds

mysql client when 0 rows for deletion:

    mysql> DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55';
    Query OK, 0 rows affected (0.00 sec)

Log results when 1 days results for deletion:

    [debug] ScriptController::actionDeleteHistory() success in 25.48 seconds

mysql client when 1 days results for deletion:

    mysql> DELETE FROM user_history WHERE dateSent < '2013-05-05 13:41:55';
    Query OK, 672260 rows affected (15.70 sec)

Is there a reason why PDO is slower?

Cheers.

Responses to comments:

It's the same query on both, so the index is either being picked up or it's not. And it is.

EXPLAIN SELECT * FROM user_history WHERE dateSent < '2013-05-05 13:41:55' 
1   SIMPLE  user_history range  date_sent   date_sent   4   NULL    4   Using where 

MySQL and Apache are running on the same server for the purposes of this test. If you're getting at an issue of load, then mysql does hit 100% for the 13 seconds on the in-code query. On the mysql client query, it doesn't get chance to register on top before the query is complete. I can't see how this is not something that PHP/PDO is adding to the equation but I'm open to all ideas.

:date is a PDO placeholder, and the fieldname is dateSent so there is no conflict with mysql keywords. Still, using :dateSent instead still causes the delay.

Also already tried without using placeholders but neglected to mention this so good call, thanks! Along the lines of this. Still the same delay with PHP/PDO.

DB::getInstance()->getDbh()->query(DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55')

And using placeholders in mysql client still shows no delay:

PREPARE test from 'DELETE FROM user_history WHERE dateSent < ?';
SET @datesent='2013-05-05 13:41:55';
EXECUTE test USING @datesent;
Query OK, 0 rows affected (0.00 sec)

It's a MYISAM table so no transactions involved on this one.

Value of $date differs to test for no deletions or one day's deletions, as shown in the query run on mysql client which is taken from SHOW PROCESSLIST while the code is running. In this case it is not passed to the method and is derived from:

    if (!isset($date)) {
        $date = date("Y-m-d H:i:s", strtotime(sprintf("-%d days", self::DELETE_BEFORE)));
    }

And at this point the table schema may get called into question, so:

CREATE TABLE IF NOT EXISTS `user_history` (
  `userId` int(11) NOT NULL,
  `asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`userId`,`asin`),
  KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

It's a decent sized website with lots of DB calls throughout. I see no evidence in the way the site performs in any other respect that suggests it is down to dodgy routing. Especially as I see this query on SHOW PROCESSLIST slowly creeping its way up to 13 seconds when run in PHP/PDO, but it takes no time at all when run in mysql (particularly referring to where no records are to be deleted which takes 13 seconds in PHP/PDO only).

Currently it is only this particular DELETE query that is in question. But I don't have another bulk DELETE statement like this anywhere else in this project, or any other project of mine that I can think of. So the question is particular to PDO DELETE queries on big-ish tables.

"Isn't that your answer then?" - No. The question is why does this take significantly longer in PHP/PDO compared to mysql client. The SHOW PROCESSLIST only shows this query taking time in PHP/PDO (for no records to be deleted). It takes no time at all in mysql client. That's the point.

Tried the PDO query without the try-catch block, and there is still a delay.


And trying with mysql_* functions shows the same timings as with using the mysql client directly. So the finger is pointing quite strongly at PDO right now. It could be my code that interfaces with PDO, but as no other queries have an unexpected delay, this seems less likely:

Method:

    $conn = mysql_connect(****);
    mysql_select_db(****);

    $query = "DELETE FROM " . static::$table . " WHERE dateSent < '$date'";
    $result = mysql_query($query);

Logs for no records to be deleted:

Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-03 15:12:54'
Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:12:54 [verbose] ScriptController::actionDeleteHistory() success in 0.01 seconds

Logs for one day's records to be deleted:

Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-07 15:14:08'
Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:14:24 [debug] ScriptController::apiReturn(): {"message":true}
Fri May 17 15:14:24 [verbose] ScriptController::actionDeleteHistory() success in 15.55 seconds

And tried again avoid calls to DB singleton by creating a PDO connection in the method and using that, and this has a delay once again. Though there are no other delays with other queries that all use the same DB singleton so worth a try, but didn't really expect any difference:

    $connectString = sprintf('mysql:host=%s;dbname=%s', '****', '****');
    $dbh = new \PDO($connectString, '****', '****');
    $dbh->exec("SET CHARACTER SET utf8");
    $dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);

    $smt = $dbh->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
    $smt->execute(array(':date' => $date));

Calling method with time logger:

    $startTimer = microtime(true);
    $deleted = $this->apiReturn(array('message' => UserHistory::deleteBefore()));
    $timeEnd = microtime(true) - $startTimer;
    Logger::write(LOG_VERBOSE, "ScriptController::actionDeleteHistory() success in " . number_format($timeEnd, 2) . " seconds");

Added PDO/ATTR_EMULATE_PREPARES to DB::connect(). Still has the delay when deleting no records at all. I've not used this before but it looks like the right format:

   $this->dbh->setAttribute(\PDO::ATTR_EMULATE_PREPARES, false);

Current DB::connect(), though if there were general issues with this, surely it would affect all queries?

public function connect($host, $user, $pass, $name)
{
    $connectString = sprintf('mysql:host=%s;dbname=%s', $host, $name);
    $this->dbh = new \PDO($connectString, $user, $pass);
    $this->dbh->exec("SET CHARACTER SET utf8");
    $this->dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
 }

The indexes are shown above in the schema. If it was directly related to rebuilding the indexes after the deletion of the record, then mysql would take the same time as PHP/PDO. It doesn't. This is the issue. It's not that this query is slow - it's expected to take some time. It's that PHP/PDO is noticeably slower than queries executed in the mysql client or queries that use the mysql lib in PHP.


MYSQL_ATTR_USE_BUFFERED_QUERY tried, but still a delay


DB is a standard singleton pattern. DB::getInstance()->getDbh() returns the PDO connection object created in the DB::connect() method shown above, eg: DB::dbh. I believe I've proved that the DB singleton is not an issue as there is still a delay when creating the PDO connection in the same method as the query is executed (6 edits above).


I've found what it causing, but I don't know why this is happening right this minute.

I've created a test SQL that creates a table with 10 million random rows in the right format, and a PHP script that runs the offending query. And it takes no time at all in PHP/PDO or mysql client. Then I change the DB collation from the default latin1_swedish_ci to utf8_unicode_ci and it takes 10 seconds in PHP/PDO and no time at all in mysql client. Then I change it back to latin1_swedish_ci and it takes no time at all in PHP/PDO again.

Tada!

Now if I remove this from the DB connection, it works fine in either collation. So there is some sort of problem here:

 $dbh->exec("SET CHARACTER SET utf8");

I shall research more, then follow up later.

So...

This post explains where the flaw was.

Is "SET CHARACTER SET utf8" necessary?

Essentially, it was the use of:

$this->dbh->exec("SET CHARACTER SET utf8");

which should have been this in DB::connect()

$this->dbh->exec("SET NAMES utf8");

My fault entirely.

It seems to have had dire effects because of a need on the part of the mysql server to convert the query to match the collation of the DB. The above post gives much better details than I can.

If anyone has the need to confirm my findings, this series of SQL queries will setup a test DB and allow you to check for yourself. Just make sure that the indexes are correctly enabled after the test data has been entered because I had to drop and re-add these for some reason. It creates 10 million rows. Maybe less will be enough to prove the point.

DROP DATABASE IF EXISTS pdo_test;
CREATE DATABASE IF NOT EXISTS pdo_test;
USE pdo_test;

CREATE TABLE IF NOT EXISTS test (
  `userId` int(11) NOT NULL,
  `asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
  `dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`userId`,`asin`),
  KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

drop procedure if exists load_test_data;

delimiter #
create procedure load_test_data()
    begin
        declare v_max int unsigned default 10000000;
        declare v_counter int unsigned default 0;

        while v_counter < v_max do
            INSERT INTO test (userId, asin, dateSent) VALUES (FLOOR(1 + RAND()*10000000), SUBSTRING(MD5(RAND()) FROM 1 FOR 10), NOW());
            set v_counter=v_counter+1;
        end while;
    end #
delimiter ;

ALTER TABLE test DISABLE KEYS;
call load_test_data();
ALTER TABLE test ENABLE KEYS;

# Tests - reconnect to mysql client after each one to reset previous CHARACTER SET

# Right collation, wrong charset - slow
SET CHARACTER SET utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

# Wrong collation, no charset - fast
ALTER DATABASE pdo_test COLLATE='latin1_swedish_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

# Right collation, right charset - fast
SET NAMES utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test  WHERE dateSent < '2013-01-01 00:00:00';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM