[英]copy subset of rows from one table to another, filtering on two columns
I have the following MySql table containing my raw event data (about 1.5 million rows) 我有以下MySql表包含我的原始事件数据(约150万行)
userId | pathId | other stuff....
I have an index on userId, pathId
(approx 50,000 unique combinations) 我在userId, pathId
上有一个索引userId, pathId
(大约50,000个唯一组合)
During my processing, I identify 30,000 userId, pathId
values that I don't want, but I do want to keep the original raw table. 在我处理过程中,我确定了30,000个userId, pathId
我不想要的userId, pathId
值,但我确实希望保留原始的原始表。 So I want to copy all rows into a processed event table, except the rows that match this 30,000 userId, pathId
values. 所以我想将所有行复制到已处理的事件表中,但与此30,000 userId, pathId
值匹配的行除外。
An approach I'm considering is to write the 30,000 userId,PathId
values of the rows I do not want into a temp_table, and then doing something like this: 我正在考虑的一种方法是将我不想要的行的30,000 userId,PathId
值写入temp_table,然后执行以下操作:
[create table processed_table ...]
insert into processed_table
select * from raw_table r
where not exists (
select * from temp_table t where r.userId=t.userid and r.pathId=t.pathId
)
For info, processed_table
generally ends up being half the size of raw_table
. 有关信息, processed_table
通常最终只是raw_table
一半。
Anyway, this seems to work but my SQL skills are limited, so my question (finally) is - is this the most efficient way to do this? 无论如何,这似乎有效但我的SQL技能有限,所以我的问题(最后)是 - 这是最有效的方法吗?
No, it's not the most efficient. 不,这不是最有效的。 Source 资源
That's why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS. 这就是为什么在MySQL中搜索缺失值的最佳方法是使用LEFT JOIN / IS NULL或NOT IN而不是NOT EXISTS。
Here's an example with NOT IN
: 以下是NOT IN
的示例:
INSERT INTO processed_table
SELECT *
FROM raw_table
WHERE (userId, pathId) NOT IN (
SELECT userId, pathId FROM temp_table
)
And LEFT JOIN ... IS NULL
: 和LEFT JOIN ... IS NULL
:
INSERT INTO processed_table
SELECT *
FROM raw_table r
LEFT JOIN temp_table t
ON r.userId = t.userid AND r.pathId = t.pathId
WHERE t.userId IS NULL
However, since your table is very small and has only 50,000 rows, your original query is probably fast enough. 但是,由于您的表非常小并且只有50,000行,因此您的原始查询可能足够快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.