简体   繁体   中英

Select remaining records from two tables (tableB - tableA)

How to select records that exists in tableB but not in tableA (basically it is like tableB - tableA)?

I have following tables: (each table have 1 million records)

tableA
    id int(11) NOT NULL PRIMARY KEY
    name varchar(50)
    sku varchar(10) index
    description text

tableB
    id int(11) NOT NULL PRIMARY KEY
    stock int(11)
    price int(11)
    sku varchar(10) index

Note: sku is indexed.

  1. tableA and tableB have one to one relation in sku field.
  2. Both table have 1m records
  3. I want to get records that exists in tableB but not in tableA (basically it is like tableB - tableA). LEFT JOIN and NOT IN sucks (very much slow).

What can be alternative solution?

Following are the query I tried:

LEFT JOIN query:
    SELECT a.sku FROM tableA a
    LEFT JOIN tableB b
       ON a.sku = b.sku 
    WHERE a.sku is NULL

NOT IN query:
    SELECT * from tableB where sku NOT IN (SELECT sku from tableA)

I think the best solution is either the left join or:

select *
from tableB b
where not exists (select 1 from tableA where a.sku = b.sku);

The problem with the not in solution is that a NULL value in tableA will result in no rows being returned in the query.

For performance, you need an index on tableA(sku) :

create index tableA_sku on tableA(sku);

This will speed both the left join and not exists versions.

Preparing the environment to replicate the issue:

DELIMITER $$

DROP PROCEDURE IF EXISTS `insertMe` $$
CREATE PROCEDURE insertMe()
BEGIN
 DECLARE i BIGINT DEFAULT 2;
  WHILE (i <= 1000000) DO
   INSERT INTO tableA(id,NAME,sku,description) VALUES(i,CONCAT('name',i),CONCAT('sku',i),CONCAT('description',i));  
   IF(i%2=0) THEN
       INSERT INTO tableB(id,stock,price,sku) VALUES(i,i%10,i%5,CONCAT('sku',i)); 
       END IF;
   SET i=i+1;
 END WHILE;
/*
CALL insertMe();
*/
END $$

DELIMITER ;

CREATE INDEX idx_tableA ON tableA(sku);
CREATE INDEX idx_tableB ON tableB(sku);

Query(using left join) for your requirements has been executed in 1-2Sec in My test environment(2 Virtual CPUs with 1G RAM).

Your left join query and query mentioned by @Gordon Linoff don't have any issue, I'd suggest to verify the index usage by execution as suggested by GordonL.

Next, I am suspecting for stale table statistics in the MYSQL database for the performance issue. Please try following script then see the performance of your query.

ANALYZE TABLE tableA;
ANALYZE TABLE tableB;

Last but not least, try backup/recreate both tables and their indexes along with above analyze statements, then see the performance of query. It fixes the issue of row chaining and migration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM