简体   繁体   English

Mysql GROUP BY 太慢。 有什么帮助让它更快吗?

[英]Mysql GROUP BY too slow. Any help to make it faster?

So I have a JS script which people embed into their sites and it tracks all the URLs and clicks of visitors.所以我有一个 JS 脚本,人们将它嵌入到他们的网站中,它会跟踪访问者的所有 URL 和点击。 Each visitor gets "Token" which is unique value to them and is used to track their actions on the site.每个访问者都会获得“令牌”,这对他们来说是独一无二的,用于跟踪他们在网站上的行为。

I wanted to show owners of the site actions of their visitors so I wrote following query.我想向他们的访问者展示网站操作的所有者,所以我写了以下查询。 All the visits are stored in "custom_logs" table.所有访问都存储在“custom_logs”表中。

SELECT *
FROM custom_logs
WHERE pn = 'pn-9283896662' AND
      id IN (SELECT MAX(id)
             FROM custom_logs
             WHERE action_clicked_text LIKE '%sometext%'  
             GROUP BY token
           )
      AND token != '' AND
      action_timestamp > 11568 AND
      action_timestamp < 1570846368
 order by action_timestamp desc
 LIMIT 0, 30;

I narrowed the problem to "GROUP BY token" part of the query.我将问题缩小到查询的“GROUP BY token”部分。 When I remove this part query runs way faster, but still pretty slow (0.7s compared to 5s with Group By part), and there are 4 of those queries in one page.当我删除这部分查询运行得更快,但仍然很慢(0.7s 与 Group By 部分的 5s 相比),并且一页中有 4 个这样的查询。 In custom_logs table there is already like 250 000 rows.在 custom_logs 表中已经有 250 000 行。

"GROUP BY token" part is there because I want to show people only one and only newest log of each user so they can click on it and see the complete logs of said user. “GROUP BY token”部分之所以存在,是因为我只想向人们显示每个用户的一个且唯一的最新日志,以便他们可以单击它并查看所述用户的完整日志。 So after the query I use mysqli_num_rows($results);所以在查询之后我使用mysqli_num_rows($results); to count the results.计算结果。 Is there any other way besides caching to quickly count all the results?除了缓存还有其他方法可以快速计算所有结果吗?

I read something about indexing columns so I made token varchar(255) and made it indexed but it did nothing in terms of speed.我读了一些关于索引列的内容,所以我制作了令牌 varchar(255) 并对其进行了索引,但它在速度方面没有任何作用。 But also I am not so good in SQL at all.但我在 SQL 方面也不是那么好。

I would suggest replacing in with a correlated subquery.我建议用相关的子查询替换in I'm thinking:我在想:

SELECT cl.*
FROM custom_logs cl
WHERE cl.pn = 'pn-9283896662' AND
      cl.id = (select max(cl2.id)
               from customer_logs cl2
               where cl2.token = cl.token and
                     cl2.action_clicked_text LIKE '%sometext%'
              ) and
      cl.token <> '' AND
      cl.action_timestamp > 11568 AND
      cl.action_timestamp < 1570846368
order by cl.action_timestamp desc
limit 0, 30;

And for this, I recommend the following indexes:为此,我推荐以下索引:

  • customer_logs(pn, action_timestamp, token, id)
  • customer_logs(token, action_clicked_text, id)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM