简体   繁体   English

Mysql返回最后一个活动早于X天的ID

[英]Mysql Return ID where last activity is older than X Days

i have a log tabel where every user activity is stored. 我有一个日志表格,其中存储了每个用户活动。

UserActivityTable (arround 15Milion records) UserActivityTable(15周年纪录)

  id    userID    category           value                 timestamp 
    1        2         Visit          homepage          2018-02-21 13:13:54
    1        2         Visit          page2             2018-02-18 13:13:45
    1        2         Visit          page1             2018-02-15 13:13:30
    1        3         Visit          homepage          2018-02-01 13:13:12

With an SQL query i need to get all userID´s where the last activity is older than X Days (lets say 30) if the user is set to "Active" 使用SQL查询我需要获取所有用户ID,如果用户设置为“活动”,则最后一个活动的时间早于X天(假设为30)

Users(arround 15k User) 用户(15k用户)

id     Groups     Active   Name    Mails ...
2      Customer    1       Hans
3      Customer    0       Wurst

if i get all user that active (arround 5k) and than try to get there last activity i run into a timeout (the query is not perfomant i think) If i limit it to 5 there is no problem. 如果我得到所有活跃的用户(arround 5k)并且尝试到达最后一个活动我遇到超时(我认为查询不是性能)如果我将它限制为5则没有问题。

What i try. 我尝试了什么。

1 Select of all User that are active, than use a foreach function to get there last activity if its older than 30 days i write it inside new array and at the end i use that array to set the activity inside the user table to false. 1选择所有活动的用户,而不是使用foreach函数来获取最后一个活动,如果超过30天我将其写入新数组中,最后我使用该数组将用户表中的活动设置为false。

Untill the last 2-3 Month it was just fine but now we have a lot of new users and the function cant handle it. 直到过去的2-3个月,它很好,但现在我们有很多新用户,功能无法处理它。

Is there a clean way to get all that stuff in one sql query? 有没有一种干净的方法来获取一个SQL查询中的所有东西?

You can use the following query to get the Users : 您可以使用以下查询来获取Users

SELECT `userID`, MAX(`timestamp`) AS lastActive FROM `UserActivityTable` 
WHERE `userID` IN (
    SELECT `id` FROM `Users` WHERE `Active` = 1
) GROUP BY `userID` HAVING lastActive < DATE_SUB(NOW(), INTERVAL 30 DAY)

Indexing 索引

  • You should use a PRIMARY KEY index on the Users table. 您应该在Users表上使用PRIMARY KEY索引。
  • You should use a FOREIGN KEY index on UserActivityTable table. 您应该在UserActivityTable表上使用FOREIGN KEY索引。
  • To speed up the query above you can create a column index on timestamp column. 要加快上面的查询速度,可以在timestamp列上创建列索引。

You can use the following to create a INDEX on the timestamp column: 您可以使用以下命令在timestamp列上创建INDEX

CREATE INDEX index_timestamp ON `UserActivityTable` (`timestamp`);

You can also use a single query to UPDATE the active state on the Users table: 您还可以使用单个查询来UPDATE Users表上的active状态:

UPDATE `Users` SET `active` = EXISTS (
    SELECT `userID` FROM `UserActivityTable` WHERE `UserActivityTable`.`userID` = `Users`.`id` GROUP BY `UserActivityTable`.`userID` HAVING MAX(`UserActivityTable`.`timestamp`) > DATE_SUB(NOW(), INTERVAL 30 DAY)
)

Is there a clean way to get all that stuff in one sql query? 有没有一种干净的方法来获取一个SQL查询中的所有东西?

Yes , you can update the Users table in a single step with the following query: 是的 ,您可以使用以下查询在一个步骤中更新 Users表:

UPDATE `Users` SET `Active` = EXISTS(
    SELECT * from `UserActivityTable ` WHERE
        `UserActivityTable `.`userID` = `Users`.`id` AND
        `timestamp`>DATE_SUB( NOW(), INTERVAL 30 DAY )
    )

The EXISTS statement returns 1 or 0 depending if does esists at least one record in the user activity in the last 30 days. EXISTS语句返回10具体取决于过去30天内用户活动中是否至少有一条记录。 So the field Active is properly updated to 1 or 0 for every user. 因此,对于每个用户,字段Active正确地更新为10


Mysql Return ID where last activity is older than X Days Mysql返回最后一个活动早于X天的ID

If you just want the list of users' id with activity you have: 如果您只想要具有活动的用户ID列表:

SELECT `Users`.`id` WHERE EXISTS(
    SELECT * from `UserActivityTable ` WHERE
        `UserActivityTable `.`userID` = `Users`.`id` AND
        `timestamp`>DATE_SUB( NOW(), INTERVAL 30 DAY )
    ) = 1

In order to have good performaces (at least) the field timestamp must be indexed. 为了获得良好的性能(至少),必须对字段timestamp进行索引。


Side note 边注

You already hit 15M records. 你已经打了15M的记录。

As your events table will grow indefinitely over time you should consider deleting periodically old entries or moving them to a separate table/dump file. 随着事件表随着时间的推移无限增长,您应该考虑定期删除旧条目或将它们移动到单独的表/转储文件中。

Don't do it. 不要这样做。

It is improper to have redundant information in a database. 在数据库中拥有冗余信息是不合适的。 ( active is redundant because it can be discovered by a query against UserActivityTable .) active是多余的,因为它可以通过针对UserActivityTable的查询来发现。)

OK, you need more performance, so you are setting this flag. 好的,你需要更多的性能,所以你要设置这个标志。 I assume this is not a one-time task, but needs to be updated daily? 我认为这不是一次性任务,但需要每天更新? Or what? 要不然是啥? I ask this because active=0 will be wrong if the 'user' does something after you run the UPDATE , and before running it again! 我问这个是因为如果'用户' 你运行UPDATE 之后做了什么,并且再次运行它之前, active=0将是错误的!

Let's solve that bug , then discover that we are making the UPDATE very fast in the process. 让我们解决这个bug ,然后发现我们在这个过程中非常快速地进行UPDATE

The 'only' way to fix that bug is to reach into UserActivityTable dynamically. 修复该错误的“唯一”方法是动态访问UserActivityTable However, we can make that so cheap that it is OK to do it in 'realtime'. 但是,我们可以做到这么便宜,以便“实时”完成。

FROM Users
WHERE EXISTS ( SELECT * FROM UserActivityTable
                 WHERE userID = x.userID
                   AND timestamp > NOW() - INTERVAL 30 DAY )  -- == "active"

UserActivityTable needs INDEX(userID, timestamp)

Oops! 哎呀! I just obviated the need for the active column. 我只是避免了对active列的需求。

One of your Comments mentioned purging 'old, inactive' users?? 你的一条评论提到清除“旧的,不活跃的”用户? Is the UPDATE aimed at that? UPDATE目标是什么? Please fold that requirement into the question, else I (and others) are not necessarily helping you. 将该要求折叠到问题中,否则我(和其他人)不一定帮助您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM