在 R 中进行 mysql 更新查询的任何更快方法？在蟒蛇？

Question

I tried to run this query:我试图运行这个查询：

update table1 A 
set number = (select count(distinct(id)) from table2 B where B.col1 = A.col1 or B.col2 = A.col2);

but it takes forever bc table1 has 1,100,000 rows and table2 has 350,000,000 rows.但它需要永远 bc table1 有 1,100,000 行，table2 有 350,000,000 行。

Is there any faster way to do this query in R?有没有更快的方法在 R 中执行此查询？ or in python?还是在蟒蛇？

Answer 1

I rewrote your query with three subqueries instead of one - with UNION and two INNER JOIN statements:我用三个子查询而不是一个子查询重写了您的查询 - 使用UNION和两个INNER JOIN语句：

UPDATE table1 as A
SET number = (SELECT COUNT(DISTINCT(id)) 
              FROM
                  (SELECT A.id as id
                   FROM table1 as A
                   INNER JOIN table2 as B
                   ON A.col1 = B.col1) -- condition for col1

                   UNION DISTINCT

                  (SELECT A.id as id
                   FROM table1 as A
                   INNER JOIN table2 as B
                   ON A.col2 = B.col2) -- condition for col2
              )

My notes:我的笔记：

Updating all of the rows in table1 doesn't look like a good idea, because we have to touch 1.1M rows.更新table1中的所有行看起来不是一个好主意，因为我们必须接触 1.1M 行。 Probably, another data structure for storing number would have better performance可能，另一种用于存储number数据结构会具有更好的性能
Try to run part of the query without update of table1 (only part of the query in parenthesis尝试在不更新table1情况下运行部分查询（仅括号中的部分查询
Take a look into EXPLAIN , if you need more general approach for optimization of SQL queries: https://dev.mysql.com/doc/refman/5.7/en/using-explain.html查看EXPLAIN ，如果您需要更通用的 SQL 查询优化方法： https : //dev.mysql.com/doc/refman/5.7/en/using-explain.html

在 R 中进行 mysql 更新查询的任何更快方法？在蟒蛇？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-06 07:36:07

在 R 中进行 mysql 更新查询的任何更快方法？ 在蟒蛇？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-06 07:36:07

在 R 中进行 mysql 更新查询的任何更快方法？在蟒蛇？

解决方案1
1 已采纳 2016-05-06 07:36:07