简体   繁体   中英

SQL for time periods

I have statistic table for internet radio (MySQL), there are such columns:

  • ip_address
  • time_start (datetime of listening start)
  • time_end (datetime of listening finish)

I need to select the listeners peak for each day, I mean maximum number of simultaneous unique ip listeners.

And it would be great also to have start and finish time for that peak.

For example:

2011-30-01  |  4 listeners peak  |  from 10:30  |  till 11:25

在此处输入图片说明

IMHO it's simpler to load these 35'000 rows in memory, enumerate them, and maintain a count of the concurrent listener at a given moment.
This would be simpler if you load the row in the following format:

IP, Time, flag_That_Indicate_StartOrStop_Listening_For_This_Given_IP

so you'll be able to load the data ordered by time, and the you should simply enumerate all rows maintaining a list of listening IP.

Anyway, how do you consider multiple connections from the same IP?
There can be 10 different listeners behind a NAT using the same IP address.

Update: You don't really need to change the DB structure, it's enough use a different SQL to load the data

SELECT ip_address, Time_Start AS MyTime, 1 As StartStop
FROM MyTable
ORDER BY Time_Start

UNION ALL

SELECT ip_address, Time_Stop AS MyTime, 0 As StartStop
FROM MyTable

Using this SQL you should be able to load all the data, and then enumerate all the rows.
It's important that the rows are sorted correctly.

if StartStop = 1 it's somone that start listening --> Add it's IP to the list of listeners, and increment the listeners count by 1
if StartStop = 0 it's someone that stop listening --> remove it's IP from the list of listeners, and decrement the listeners count by 1

and in the enumeration loop check when you reach the maximum number of concurrent listeners

Let go to find for an algorithm to get results with best performance.

  • Spliting time : Time is a continuous dimension, we need some points to mark as checkpoint where do a listener recount. How to find intervals or when check for total radio listener. I thing that the best strategy is to get different time_start and time_end .

This is my approach to split time. I create a view to simplify post:

create view time_split as
select p_time from (
  Select 
       time_start
  from 
       your_table
  union
  Select 
       time_end
  from 
       your_table
  ) as T

I suggest to you 2 database index:

your_table( time_start, time_end)  <--(1) explained below
your_table( time_end)

to avoid tablescan.

  • Count listeners peak : Join previous table with your table to do a recount of peak at each time checkpoint:

This is my approach for count listeners by check point time:

  create view peak_by_time as
  select p_time, count(*) as peak
  from
     your_table t
        inner join
     time_split
        on time_split.p_time between t.time_start and t.time_end
  group by
     p_time
  order by 
     p_time, peak

Remember to make a database index on your_table( time_start, time_end) <--(1) Here

  • Looking for max peak : Unfortunately MySQL don't has analytic functions, then over partition is not available and is not a way to take max peak over a day in previous view. Then you should get max peak of previous views. This is a performance killer operation. I suggest to you make this operation and next on in application logic and not in data base.

This is my approach for get max_peak by day ( performance killer ):

  create view max_peak_by_day as
  select 
       cast(p_time as date) as p_day ,
       max(peak) as max_peak
  from peak_by_time
  group by cast(p_time as date)
  • Looking for slot times : at this moment you have max_peak for each day, now you need to look for continuous check times with same max_peak. Also MySQL don't has statistical functions neither CTE. I suggest to you that this code will be wrote on app layer. But, if you want to do this in database solution this is a way ( warning performance killer ):

First, extend peak_by_time view to get previous peak for p_time and for previous p_time:

create view time_split_extended as
select c.p_time, max( p.p_time) as previous_ptime
from 
  time_split c
    inner join 
  time_split p
    on p.p_time < c.p_time
group by c.p_time

create view peak_by_time_and_previous as
select 
   te.p_time,  
   te.previous_ptime, 
   pc.peak as peak, 
   pp.peak as previous_peak
from 
  time_split_extended te
    inner join 
  peak_by_time pc on te.p_time = pc.p_time
    inner join
  peak_by_time pp on te.previous_ptime = pp.p_time

Now check that previous slot and current one have a max_peak:

select 
   cast(p_time as date) as p_day, 
   min( p_time ) as slot_from, 
   max( p_time) as slot_to, 
   peak
from 
   peak_by_time_and_previous p
      inner join 
   max_peak_by_day m
      on cast(p.p_time as date) = m.p_day and
         p.peak = m.max_peak
where 
   p.peak = p.previous_peak
group by  cast(p_time as date) 

Disclaimer :

  • This is not tested. Sure that they are mistakes with table aliases or columns.
  • The last steps are performance killers . Perhaps someone can suggest best approach for this steps.

Also, I suggest to you that create temporary tables and materialize each view of this answer. This will improve performance and also you can know how many time takes each step.

This is essentially an implementation of the answer given by Max above. For simplicity I'll represent each listening episode as a start time and length as integer values (they could be changed to actual datetimes, and then the queries would need to be modified to use date arithmetic.)

> select * from episodes;
+--------+------+
| start  | len  |
+--------+------+
|  50621 |  480 |
|  24145 |  546 |
|  93943 |  361 |
|  67668 |  622 |
|  64681 |  328 |
| 110786 |  411 |
...

The following query combines the start and end times with a UNION , flagging end times to distinguish from start times, and keeping a running accumulator of the number of listeners:

SET @idx=0;
SET @n=0;
SELECT (@idx := @idx + 1) as idx,
       t,
       (@n := @n + delta) as n
  FROM
  (SELECT start AS t,
          1 AS delta
     FROM episodes
     UNION ALL
     SELECT start + len AS t,
            -1 AS delta FROM episodes
     ORDER BY t) stage

+------+--------+------+
| idx  | t      | n    |
+------+--------+------+
|    1 |      8 |    1 |
|    2 |    106 |    2 |
|    3 |    203 |    3 |
|    4 |    274 |    2 |
|    5 |    533 |    3 |
|    6 |    586 |    2 |
...

where 't' is the start of each interval (it's a new "interval" whenever the number of listeners, "n", changes). In a version where "t" is an actual datetime, you could easily group by day to obtain a peak episode for each day, or other such summaries. To get the end time of each interval - you could take the table above and join it to itself on right.idx = left.idx + 1 (ie join each row with the succeeding one).

SELECT
  COUNT(*)               AS listeners,
  current.time_start,    AS peak_start,
  MIN(overlap.time_end)  AS peak_end
FROM
  yourTable    AS current
INNER JOIN
  yourTable    AS overlap
    ON  overlap.time_start <= current.time_start
    AND overlap.time_end   >  current.time_start
GROUP BY
  current.time_start,
  current.time_end
HAVING
  MIN(overlap.time_end) < COALESCE((SELECT MIN(time_start) FROM yourTable WHERE timeStart > current.timeStart), current.time_end+1)

For each record, join on everything that overlaps.

The MIN() of the overlapping records' time_end is when the first current listener stops listening.

If that time is less than next occurance of a time_start, it's a peak. (Peak = start immediately followed by a stop)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM