简体   繁体   中英

MySQL query based on time range, group users, and sum values over a sliding window

I want to create a new Table B based on the information from another existing Table A. I'm wondering if MySQL has the functionality to take into account a range of time and group column A values then only sum up the values in a column B based on those groups in column A.

Table A stores logs of events like a journal for users. There can be multiple events from a single user in a single day. Say hypothetically I'm keeping track of when my users eat fruit and I want to know how many fruit they eat in a week (7days) and also how many apples they eat.

So in Table BI want to count for each entry in Table A, the previous 7 day total # of fruit and apples.

EDIT:
I'm sorry I over simplified my given information and didn't thoroughly think my example.

I'm initially have only Table A. I'm trying to create Table B from a query.

Assume:

  • User/id can log an entry multiple times in a day.
  • sum counts should be for id between date and date - 7 days
  • fruit column stands for the total # of fruit during the 7 day interval ( apples and bananas are both fruit)
  • The data doesn't only start at 2013-9-5. It can date back 2000 and I want to use the 7 day sliding window over all the dates between 2000 to 2013.

The sum count is over a sliding window of 7 days

Here's an example:

Table A:                           

| id | date-time          | apples | banana |     
---------------------------------------------
|  1 | 2013-9-5 08:00:00  |   1    |   1    |  
|  2 | 2013-9-5 09:00:00  |   1    |   0    |   
|  1 | 2013-9-5 16:00:00  |   1    |   0    |  
|  1 | 2013-9-6 08:00:00  |   0    |   1    |    
|  2 | 2013-9-9 08:00:00  |   1    |   1    |  
|  1 | 2013-9-11 08:00:00 |   0    |   1    |   
|  1 | 2013-9-12 08:00:00 |   0    |   1    |   
|  2 | 2013-9-13 08:00:00 |   1    |   1    |  

note: user 1 logged 2 entries on 2013-9-5

The result after the query should be Table B.

Table B
| id | date-time          | apples | fruit  |
--------------------------------------------
|  1 | 2013-9-5 08:00:00  |   1    |   2    |
|  2 | 2013-9-5 09:00:00  |   1    |   1    |
|  1 | 2013-9-5 16:00:00  |   2    |   3    |
|  1 | 2013-9-6 08:00:00  |   2    |   4    |
|  2 | 2013-9-9 08:00:00  |   2    |   3    |
|  1 | 2013-9-11 08:00:00 |   2    |   5    |
|  1 | 2013-9-12 08:00:00 |   0    |   3    |
|  2 | 2013-9-13 08:00:00 |   2    |   4    |

At 2013-9-12 the sliding window moves and only includes 9-6 to 9-12. That's why id 1 goes from a sum of 2 apples to 0 apples.

Assumptions:

  • one row per id/date
  • the counts should be for id between date and date - 7 days
  • "fruit" = "banana"
  • the "date" column is actually a date (including year) and not just month/day

then this SQL should do the trick:

INSERT INTO B
SELECT a1.id, a1.date, SUM( a2.banana ), SUM( a2.apples )
  FROM (SELECT DISTINCT id, date
          FROM A
         WHERE date > NOW() - INTERVAL 7 DAY
       ) a1
  JOIN A a2
    ON a2.id    = a1.id
   AND a2.date <= a1.date
   AND a2.date >= a1.date - INTERVAL 7 DAY
 GROUP BY a1.id, a1.date

Some questions:

  • Are the above assumptions correct?
  • Does table A contain more fruits than just Bananas and Apples? If so, what does the real structure look like?

You need years in your data to be able to use date arithmetic correctly. I added them.

There's an odd thing in your data. You seem to have multiple log entries for each person for each day. You're assuming an implicit order setting the later log entries somehow "after" the earlier ones. If SQL and MySQL do that, it's only by accident: there's no implicit ordering of rows in a table. Plus if we duplicate date/id combinations, the self join (read on) has lots of duplicate rows and ruins the sums.

So we need to start by creating a daily summary table of your data, like so:

    select id, `date`, sum(apples) as apples, sum(banana) as banana
      from fruit
     group by id, `date`

This summary will contain at most one row per id per day.

Next we need to do a limited cross product self-join, so we get seven days' worth of fruit eating.

select --whatever--
 from (
    -- summary query --
 ) as a  
  join (
    -- same summary query once again
 ) as b   
    on (      a.id = b.id 
         and  b.`date` between a.`date` - interval 6 day AND a.`date`   )

The between clause in the on gives us the seven days (today, and the six days prior). Notice that the table in the join with the alias b is the seven day stuff, and the a table is the today stuff.

Finally, we have to summarize that result according to your specification. The resulting query is this.

  select a.id, a.`date`,
       sum(b.apples) + sum(b.banana) as fruit_last_week,
       a.apples as apple_today
  from (
        select id, `date`, sum(apples) as apples, sum(banana) as banana
          from fruit
         group by id, `date`
     ) as a  
  join (
        select id, `date`, sum(apples) as apples, sum(banana) as banana
          from fruit
         group by id, `date`
     ) as b   on (a.id = b.id and 
                      b.`date` between a.`date` - interval 6 day AND a.`date`   )
  group by a.id, a.`date`, a.apples
  order by a.`date`, a.id

Here's a fiddle: http://sqlfiddle.com/#!2/670b2/15/0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM