I have two tables named actual
and check
Table actual
contains 50 million rows and each row contains 32-bit hashes
Table check
contains 10 million rows and each row contains 32-bit hashes
I have to verify if the hashes from check
table are in actual
table or not.
I tried MySQL Join query like
SELECT *
FROM `check`
LEFT
JOIN `actual`
on `check`.hash = `actual`.hash;
Even on 16GB RAM machine MySQL is crashing.
I tried using PHP script by adding additional fields to Table check
as field names hash, status, found.
Status & found are default 0 and PHP will check each record and update status to 1 and found to 1 if found.
Is there any way to check millions or records faster?
The other way I have INSERT using IGNORE for unique hashes and checking how many were not appended but its complex process.
The PHP code I am using is but its very slow
$sql = "SELECT * FROM `check` where status = 0 LIMIT 0, 1";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
$check = "SELECT * FROM `actual` where hash = '".$row["hash"]."'";
$checkx = $conn->query($check);
$checky = "UPDATE `check` SET `status` = 1, `found` = 0 WHERE hash = '".$row["hash"]."'";
$conn->query($checky);
if ($checkx->num_rows > 0) {
$checky = "UPDATE `check` SET `status` = 1, `found` = 1 WHERE hash = '".$row["hash"]."'";
$conn->query($checky);
}
}
}
If I've understood you right, a sub-query is all you need:
UPDATE check SET status=1, found=1 WHERE hash IN (SELECT hash FROM actual)
I don't have enough data to do a meaningful performance comparison - try it and see.
Edit: With a clearer idea of the requirement gleaned by looking at the PHP solution, here's an updated query:
UPDATE `check` SET status=1, found=(hash IN (SELECT hash FROM actual)) WHERE status=0
Note:
actual.hash
is indexed, or searching the actual
table will take an age.check
, it might be worth indexing check.status
too. If most rows are unchecked there will be no benefit, but it could work well if there are only a few unchecked ones. Writing to an indexed table could be significantly slower. You'd need to experiment with your data set to find out.Use a multi-table UPDATE instead of IN ( SELECT... )
Also
SHOW CREATE TABLE
. We need to see the engine, indexes, datatypes, etc.SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
What do you mean by "crashing"? Reboot? mysqld died? Or simply that the query took forever?
Once we have optimized the query, if it still is too slow, I will show you how to do it in stages. And that will probably involve directly writing SQL, not by going through Django.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.