简体   繁体   English

mysql选择时间跨度之间的行数

[英]mysql select number of rows between time span

I'm trying to get the total number of rows between a specific amout of time or time span. 我正在尝试获取特定时间或时间跨度之间的总行数。 Basically, let's say the following table: 基本上,让我们说下表:

CREATE TABLE IF NOT EXISTS `downloads` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`stuff_id` int(7) NOT NULL,
`user_id` int(7) NOT NULL,
`dl_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

And this table is populated each time someone downloads something. 每次有人下载内容时都会填充此表。

So what I really need is to get a list of users (user_id) that have made more than for example 100 downloads in a periods of time of, for example 24 hours. 所以我真正需要的是获得一个用户列表(user_id),其在例如24小时的时间段内进行了超过例如100次下载。 Not in the last 24 hours, but IN that exact period of time even if it has been during christmas last year =) 不是在过去的24小时内,但在那个确切的时间段内,即使去年圣诞节期间=)

Any ideas at all ?! 有什么想法吗?!

OK, I realise I'm a bit late, but I wanted to post my answer anyway :-) 好吧,我意识到我有点迟了,但无论如何我想发布我的答案:-)

What you require can be done using a subquery, but this might take ages to complete on a large table... 你需要什么可以使用子查询完成,但这可能需要很长时间才能完成一个大表...

Thinking about the question I came to two different approaches. 考虑到这个问题,我采用了两种不同的方法。

One of them has already been dealt with in the other answers, it works by starting at a specific point in time, looking at the interval that begins at this time and then looking at the interval of equal duration that immediately follows. 其中一个已经在其他答案中得到处理,它的工作原理是在特定时间点开始,查看此时开始的间隔,然后查看紧随其后的相等持续时间间隔。 This leads to clear, understandable results and is probably what would be required (eg user must not exceed 100 downloads per calender day). 这导致了清晰,可理解的结果,并且可能是需要的(例如,每个日历日用户不得超过100次下载)。 This however would completely miss situations in which a user does 99 downloads during the hour before midnight and another 99 in the first hour of the new day. 然而,这将完全错过用户在午夜之前的一小时内进行99次下载而在新一天的第一小时内进行另外99次下载的情况。

So if the required result is more of a "top ten downloaders list", then this is the other approach. 因此,如果所需的结果更像是“十大下载列表”,那么这是另一种方法。 The results here may not be as understandable at first glance, because one single download can count towards multiple intervals. 这里的结果乍一看可能不太明白,因为单个下载可以计入多个间隔。 This is because the intervals will (and need to) overlap. 这是因为间隔将(并且需要)重叠。

Here's my setup. 这是我的设置。 I've created the table from your statement and added two indexes: 我从你的语句中创建了表并添加了两个索引:

CREATE INDEX downloads_timestamp on downloads (dl_date);
CREATE INDEX downloads_user_id on downloads (user_id);

The data I've inserted into the table: 我插入表中的数据:

SELECT * FROM downloads;
+----+----------+---------+---------------------+
| id | stuff_id | user_id | dl_date             |
+----+----------+---------+---------------------+
|  1 |        1 |       1 | 2011-01-24 09:00:00 |
|  2 |        1 |       1 | 2011-01-24 09:30:00 |
|  3 |        1 |       1 | 2011-01-24 09:35:00 |
|  4 |        1 |       1 | 2011-01-24 10:00:00 |
|  5 |        1 |       1 | 2011-01-24 11:00:00 |
|  6 |        1 |       1 | 2011-01-24 11:15:00 |
|  7 |        1 |       1 | 2011-01-25 09:15:00 |
|  8 |        1 |       1 | 2011-01-25 09:30:00 |
|  9 |        1 |       1 | 2011-01-25 09:45:00 |
| 10 |        1 |       2 | 2011-01-24 08:00:00 |
| 11 |        1 |       2 | 2011-01-24 12:00:00 |
| 12 |        1 |       2 | 2011-01-24 12:01:00 |
| 13 |        1 |       2 | 2011-01-24 12:02:00 |
| 14 |        1 |       2 | 2011-01-24 12:03:00 |
| 15 |        1 |       2 | 2011-01-24 12:00:00 |
| 16 |        1 |       2 | 2011-01-24 12:04:00 |
| 17 |        1 |       2 | 2011-01-24 12:05:00 |
| 18 |        1 |       2 | 2011-01-24 12:06:00 |
| 19 |        1 |       2 | 2011-01-24 12:07:00 |
| 20 |        1 |       2 | 2011-01-24 12:08:00 |
| 21 |        1 |       2 | 2011-01-24 12:09:00 |
| 22 |        1 |       2 | 2011-01-24 12:10:00 |
| 23 |        1 |       2 | 2011-01-25 14:00:00 |
| 24 |        1 |       2 | 2011-01-25 14:12:00 |
| 25 |        1 |       2 | 2011-01-25 14:25:00 |
+----+----------+---------+---------------------+
25 rows in set (0.00 sec)

As you can see, all downloads occured either yesterday or today and were executed by two different users. 如您所见,所有下载都发生在昨天或今天,并由两个不同的用户执行。

Now, what we have to mind is the following: There is (mathematically) an infinite number of 24 hour intervals (or intervals of any other duration) between '2011-01-24 0:00' and '2011-01-25 23:59:59'. 现在,我们要注意以下几点:“数学上”在“2011-01-24 0:00”和“2011-01-25 23”之间存在无限数量的24小时间隔(或任何其他持续时间间隔) :59:59' 。 But as the server's precision is one second, this boils down to 86,400 intervals: 但是,由于服务器的精度是一秒,因此可以归结为86,400个间隔:

First interval:  2011-01-24 0:00:00 -> 2011-01-25 0:00:00
Second interval: 2011-01-24 0:00:01 -> 2011-01-25 0:00:01
Third interval: 2011-01-24 0:00:02 -> 2011-01-25 0:00:02
   .
   .
   .
86400th interval: 2011-01-24 23:59:59 -> 2011-01-25 23:59:59

So we could use a loop to iterate over all these intervals and calculate the number of downloads per user and per interval. 因此,我们可以使用循环迭代所有这些间隔,并计算每个用户和每个间隔的下载次数。 Of course, not all intervals are of the same interest to us, so we can skip some of them by using the timestamps in the table as "beginning of interval". 当然,并非所有区间对我们都有相同的兴趣,因此我们可以通过使用表中的时间戳作为“间隔开始”来跳过其中一些区间。

This is what the following query does. 这是以下查询的作用。 It uses every download timestamp in the table as "start of interval", adds the interval duration and then queries the number of downloads per user during this interval. 它使用表中的每个下载时间戳作为“间隔开始”,添加间隔持续时间,然后查询在此间隔期间每个用户的下载次数。

SET @duration = '24:00:00';
SET @limit = 5;
SELECT * FROM 
    (SELECT t1.user_id, 
            t1.dl_date startOfPeriod, 
            ADDTIME(t1.dl_date,@duration) endOfPeriod, 
           (SELECT COUNT(1) 
            FROM downloads t2 
            WHERE t1.user_id = t2.user_id 
            AND t1.dl_date <= t2.dl_date 
            AND ADDTIME(t1.dl_date,@duration) >= t2.dl_date) count
     FROM downloads t1) t3 
WHERE count > @limit;

Here's the result: 这是结果:

+---------+---------------------+---------------------+-------+
| user_id | startOfPeriod       | endOfPeriod         | count |
+---------+---------------------+---------------------+-------+
|       1 | 2011-01-24 09:00:00 | 2011-01-25 09:00:00 |     6 |
|       1 | 2011-01-24 09:30:00 | 2011-01-25 09:30:00 |     7 |
|       1 | 2011-01-24 09:35:00 | 2011-01-25 09:35:00 |     6 |
|       1 | 2011-01-24 10:00:00 | 2011-01-25 10:00:00 |     6 |
|       2 | 2011-01-24 08:00:00 | 2011-01-25 08:00:00 |    13 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:01:00 | 2011-01-25 12:01:00 |    10 |
|       2 | 2011-01-24 12:02:00 | 2011-01-25 12:02:00 |     9 |
|       2 | 2011-01-24 12:03:00 | 2011-01-25 12:03:00 |     8 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:04:00 | 2011-01-25 12:04:00 |     7 |
|       2 | 2011-01-24 12:05:00 | 2011-01-25 12:05:00 |     6 |
+---------+---------------------+---------------------+-------+
12 rows in set (0.00 sec)

This returns a list of user_id which have made more than 100 downloads during any period of 1 day: 这将返回user_id列表,该列表在1天的任何时间段内下载量超过100次:

SELECT user_id, count(user_id) as downloads_count, DATE(dl_date) 
FROM downloads
GROUP BY user_id, DATE(dl_date)
HAVING count(user_id) > 100;

If you have a period like this, which is less than or equal to 24 hours: 如果您有这样的期限,小于或等于24小时:

SET @period_start='2010-10-10 06:00:00';
SET @period_end='2010-10-11 05:59:59';

then, 然后,

SELECT user_id, COUNT(id) AS num
FROM downloads WHERE dl_date>= @period_start AND dl_date<=  @period_end
GROUP BY user_id HAVING num> 100;

But if you have a period like this, which is greater than 24h: 但如果你有这样一个时期,大于24小时:

SET @period_start='2010-10-10 06:00:00';
SET @period_end='2011-09-17 13:15:12';

how do you want to calculate your download num? 你想如何计算你的下载数量? Is it in 24h chunks from @period_end, or from @period_start. 它是来自@period_end或@period_start的24小时块。 Or do you just want the most recent 24h chunk? 或者你只想要最近24小时的大块?

You want to filter on the two date values using a BETWEEN, group on user_id, and then use HAVING to filter the grouped results. 您希望使用user_id上的BETWEEN组对两个日期值进行过滤,然后使用HAVING过滤分组结果。

Three parameters, --Date1--, --Date2--, and --Threshhold-- 三个参数, - 日期1 - , - 日期2 - 和 - 数据 -

select user_id
     , count(*)
  from downloads
 where dl_date between --Date1-- and --Date2--
 group by user_id
having count(*) > --Threshhold--

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM