简体   繁体   English

MySQL:WHERE 子句中带有 NOT IN 的从属子查询非常慢

[英]MySQL : Dependent Sub Query with NOT IN in the WHERE clause is very slow

I am auditing user details from my application using open Id login .If a first time a user is login a OPEN ID we consider as signup .我正在使用 open Id login 审核我的应用程序中的用户详细信息。如果用户第一次登录 OPEN ID,我们将其视为注册。 I am generating audit signin report using this details .我正在使用此详细信息生成审核登录报告。 Sample Table Data.示例表数据。

+---------+----------+-----------+---------------+
| USER_ID | PROVIDER | OPERATION | TIMESTAMP     |
+---------+----------+-----------+---------------+
|     120 | Google   | SIGN_UP   | 1347296347000 |
|     120 | Google   | SIGN_IN   | 1347296347000 |
|     121 | Yahoo    | SIGN_IN   | 1347296347000 |
|     122 | Yahoo    | SIGN_IN   | 1347296347000 |
|     120 | Google   | SIGN_UP   | 1347296347000 |
|     120 | FaceBook | SIGN_IN   | 1347296347000 |
+---------+----------+-----------+---------------+

In this table I want to exclude already SIGN_UP ed " SIGN_IN " ed user count based on provider .在这个表中,我想根据提供者排除已经SIGN_UP ed " SIGN_IN " ed 的用户数。

Show Create table显示创建表

CREATE TABLE `signin_details` (
  `USER_ID` int(11) DEFAULT NULL,
  `PROVIDER` char(40) DEFAULT NULL,
  `OPERATION` char(40) DEFAULT NULL,
  `TIMESTAMP` bigint(20) DEFAULT NULL
) ENGINE=InnoDB

I am using this query .我正在使用这个查询。

select 
  count(distinct(USER_ID)) as signin_count, 
  PROVIDER from signin_details s1 
where 
  s1.USER_ID NOT IN 
  (
    select 
      USER_ID 
    from signin_details 
    where 
      signin_details.PROVIDER=s1.PROVIDER 
      and signin_details.OPERATION='SIGN_UP' 
      and signin_details.TIMESTAMP/1000 BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000
  )  
  AND OPERATION='SIGN_IN' group by PROVIDER;

Explain Output:解释输出:

+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type        | table          | type | possible_keys | key  | key_len | ref  | rows | Extra                       |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
|  1 | PRIMARY            | s1             | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using where; Using filesort |
|  2 | DEPENDENT SUBQUERY | signin_details | ALL  | NULL          | NULL | NULL    | NULL |    6 | Using where                 |
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+

Query Output :查询输出:

+--------------+----------+
| signin_count | PROVIDER |
+--------------+----------+
|            1 | FaceBook |
|            2 | Yahoo    |
+--------------+----------+

It takes more than 40 minutes to execute for 200k rows.执行 20 万行需要 40 多分钟。

My assumption is it will check each row with total number of dependant subquery output.我的假设是它将检查每一行与依赖子查询输出的总数。

My Assumption on this query.我对这个查询的假设。

 A -> Dependant Outputs (B,C,D) .
 A check with B
 A check with C
 A check with D

If dependant query output is larger it will take so long time to execute.如果相关查询输出较大,则执行时间会很长。 How to improve this query?如何改进这个查询?

If you use MySQL you have to know that sub queries performs awful slow.如果您使用MySQL,您必须知道子查询的执行速度非常慢。

IN is slow... IN很慢...

EXISTS is often faster then IN EXISTS通常比IN更快

JOIN is mostly the fastest way do things like this. JOIN主要是做这样的事情的最快方式。

SELECT DISTINCT
  s1.PROVIDER,
  COUNT(DISTINCT s1.USER_ID)

FROM 
  signin_details s1
  LEFT JOIN 
  (
    SELECT DISTINCT
      USER_ID, PROVIDER
    FROM 
      signin_details 
    WHERE
      signin_details.OPERATION='SIGN_UP' 
      AND 
        signin_details.TIMESTAMP 
          BETWEEN 
            UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 
            AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000
  ) AS t USING  (USER_ID, PROVIDER)

WHERE
  t.USER_ID IS NULL
  AND OPERATION='SIGN_IN'
GROUP BY s1.PROVIDER

http://sqlfiddle.com/#!2/122ac/12 http://sqlfiddle.com/#!2/122ac/12

NOTE: If you wonder about the sqlfiddle result consider here is a UNIX_TIMESTAMP in the query.注意:如果您想知道 sqlfiddle 结果,请考虑这里是查询中的UNIX_TIMESTAMP

Result:结果:

| PROVIDER | COUNT(DISTINCT S1.USER_ID) |
-----------------------------------------
| FaceBook |                          1 |
|    Yahoo |                          2 |

MySQL and the INTERSECT story. MySQL 和INTERSECT故事。 You get all combinations of USER_ID and PROVIDER which you don't want to count.您将获得不想计算的USER_IDPROVIDER所有组合。 Then LEFT JOIN them to your data.然后LEFT JOIN它们到您的数据。 Now all the rows you want to count have no values from the LEFT JOIN .现在,您要计算的所有行都没有来自LEFT JOIN值。 You get them by t.USER_ID IS NULL .您可以通过t.USER_ID IS NULL获取它们。


Input:输入:

| rn° | USER_ID | PROVIDER | OPERATION |     TIMESTAMP |
-------------------------------------------------------
| 1   |     120 |   Google |   SIGN_UP | 1347296347000 | -
| 2   |     120 |   Google |   SIGN_IN | 1347296347000 | - (see rn° 1)
| 3   |     121 |    Yahoo |   SIGN_IN | 1347296347000 | Y
| 4   |     122 |    Yahoo |   SIGN_IN | 1347296347000 | Y
| 5   |     120 |   Google |   SIGN_UP | 1347296347000 | -
| 6   |     120 | FaceBook |   SIGN_IN | 1347296347000 | F
| 7   |     119 | FaceBook |   SIGN_IN | 1347296347000 | - (see rn° 8)
| 8   |     119 | FaceBook |   SIGN_UP | 1347296347000 | -

Use "NOT IN" inside the HAVING clause.在 HAVING 子句中使用“NOT IN”。 it will be faster than "where not in"它会比“不在的地方”更快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM