I have two tables named actual and check

Table actual contains 50 million rows and each row contains 32-bit hashes

Table check contains 10 million rows and each row contains 32-bit hashes

I have to verify if the hashes from check table are in actual table or not.

I tried MySQL Join query like

  FROM `check` 
  JOIN `actual` 
    on `check`.hash = `actual`.hash;

Even on 16GB RAM machine MySQL is crashing.

I tried using PHP script by adding additional fields to Table check as field names hash, status, found.

Status & found are default 0 and PHP will check each record and update status to 1 and found to 1 if found.

Is there any way to check millions or records faster?

The other way I have INSERT using IGNORE for unique hashes and checking how many were not appended but its complex process.

The PHP code I am using is but its very slow

$sql = "SELECT * FROM `check` where status = 0 LIMIT 0, 1";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
  while($row = $result->fetch_assoc()) {

    $check = "SELECT * FROM `actual` where hash = '".$row["hash"]."'";
    $checkx = $conn->query($check);

    $checky = "UPDATE `check` SET `status` = 1, `found` = 0 WHERE hash = '".$row["hash"]."'";
    if ($checkx->num_rows > 0) {
      $checky = "UPDATE `check` SET `status` = 1, `found` = 1 WHERE hash = '".$row["hash"]."'";

If I've understood you right, a sub-query is all you need:

UPDATE check SET status=1, found=1 WHERE hash IN (SELECT hash FROM actual)

I don't have enough data to do a meaningful performance comparison - try it and see.

Edit: With a clearer idea of the requirement gleaned by looking at the PHP solution, here's an updated query:

UPDATE `check` SET status=1, found=(hash IN (SELECT hash FROM actual))  WHERE status=0 


  • It's important that actual.hash is indexed, or searching the actual table will take an age.
  • Depending on the balance between checked and unchecked rows in check , it might be worth indexing check.status too. If most rows are unchecked there will be no benefit, but it could work well if there are only a few unchecked ones. Writing to an indexed table could be significantly slower. You'd need to experiment with your data set to find out.

Use a multi-table UPDATE instead of IN ( SELECT... )


  • What version of MySQL?
  • Please provide SHOW CREATE TABLE . We need to see the engine, indexes, datatypes, etc.
  • SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

What do you mean by "crashing"? Reboot? mysqld died? Or simply that the query took forever?

Once we have optimized the query, if it still is too slow, I will show you how to do it in stages. And that will probably involve directly writing SQL, not by going through Django.

