How to select records that exists in tableB but not in tableA (basically it is like tableB - tableA)?
I have following tables: (each table have 1 million records)
tableA
id int(11) NOT NULL PRIMARY KEY
name varchar(50)
sku varchar(10) index
description text
tableB
id int(11) NOT NULL PRIMARY KEY
stock int(11)
price int(11)
sku varchar(10) index
Note: sku is indexed.
What can be alternative solution?
Following are the query I tried:
LEFT JOIN query:
SELECT a.sku FROM tableA a
LEFT JOIN tableB b
ON a.sku = b.sku
WHERE a.sku is NULL
NOT IN query:
SELECT * from tableB where sku NOT IN (SELECT sku from tableA)
I think the best solution is either the left join
or:
select *
from tableB b
where not exists (select 1 from tableA where a.sku = b.sku);
The problem with the not in
solution is that a NULL
value in tableA
will result in no rows being returned in the query.
For performance, you need an index on tableA(sku)
:
create index tableA_sku on tableA(sku);
This will speed both the left join
and not exists
versions.
Preparing the environment to replicate the issue:
DELIMITER $$
DROP PROCEDURE IF EXISTS `insertMe` $$
CREATE PROCEDURE insertMe()
BEGIN
DECLARE i BIGINT DEFAULT 2;
WHILE (i <= 1000000) DO
INSERT INTO tableA(id,NAME,sku,description) VALUES(i,CONCAT('name',i),CONCAT('sku',i),CONCAT('description',i));
IF(i%2=0) THEN
INSERT INTO tableB(id,stock,price,sku) VALUES(i,i%10,i%5,CONCAT('sku',i));
END IF;
SET i=i+1;
END WHILE;
/*
CALL insertMe();
*/
END $$
DELIMITER ;
CREATE INDEX idx_tableA ON tableA(sku);
CREATE INDEX idx_tableB ON tableB(sku);
Query(using left join) for your requirements has been executed in 1-2Sec in My test environment(2 Virtual CPUs with 1G RAM).
Your left join query and query mentioned by @Gordon Linoff don't have any issue, I'd suggest to verify the index usage by execution as suggested by GordonL.
Next, I am suspecting for stale table statistics in the MYSQL database for the performance issue. Please try following script then see the performance of your query.
ANALYZE TABLE tableA;
ANALYZE TABLE tableB;
Last but not least, try backup/recreate both tables and their indexes along with above analyze statements, then see the performance of query. It fixes the issue of row chaining and migration.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.