I have statistic table for internet radio (MySQL), there are such columns:
I need to select the listeners peak for each day, I mean maximum number of simultaneous unique ip listeners.
And it would be great also to have start and finish time for that peak.
For example:
2011-30-01 | 4 listeners peak | from 10:30 | till 11:25
IMHO it's simpler to load these 35'000 rows in memory, enumerate them, and maintain a count of the concurrent listener at a given moment.
This would be simpler if you load the row in the following format:
IP, Time, flag_That_Indicate_StartOrStop_Listening_For_This_Given_IP
so you'll be able to load the data ordered by time, and the you should simply enumerate all rows maintaining a list of listening IP.
Anyway, how do you consider multiple connections from the same IP?
There can be 10 different listeners behind a NAT using the same IP address.
Update: You don't really need to change the DB structure, it's enough use a different SQL to load the data
SELECT ip_address, Time_Start AS MyTime, 1 As StartStop
FROM MyTable
ORDER BY Time_Start
UNION ALL
SELECT ip_address, Time_Stop AS MyTime, 0 As StartStop
FROM MyTable
Using this SQL you should be able to load all the data, and then enumerate all the rows.
It's important that the rows are sorted correctly.
if StartStop = 1 it's somone that start listening --> Add it's IP to the list of listeners, and increment the listeners count by 1
if StartStop = 0 it's someone that stop listening --> remove it's IP from the list of listeners, and decrement the listeners count by 1
and in the enumeration loop check when you reach the maximum number of concurrent listeners
Let go to find for an algorithm to get results with best performance.
time_start
and time_end
. This is my approach to split time. I create a view to simplify post:
create view time_split as
select p_time from (
Select
time_start
from
your_table
union
Select
time_end
from
your_table
) as T
I suggest to you 2 database index:
your_table( time_start, time_end) <--(1) explained below
your_table( time_end)
to avoid tablescan.
This is my approach for count listeners by check point time:
create view peak_by_time as
select p_time, count(*) as peak
from
your_table t
inner join
time_split
on time_split.p_time between t.time_start and t.time_end
group by
p_time
order by
p_time, peak
Remember to make a database index on your_table( time_start, time_end) <--(1) Here
over partition
is not available and is not a way to take max peak over a day in previous view. Then you should get max peak of previous views. This is a performance killer operation. I suggest to you make this operation and next on in application logic and not in data base. This is my approach for get max_peak by day ( performance killer ):
create view max_peak_by_day as
select
cast(p_time as date) as p_day ,
max(peak) as max_peak
from peak_by_time
group by cast(p_time as date)
max_peak
for each day, now you need to look for continuous check times
with same max_peak. Also MySQL don't has statistical functions neither CTE. I suggest to you that this code will be wrote on app layer. But, if you want to do this in database solution this is a way ( warning performance killer ): First, extend peak_by_time
view to get previous peak for p_time and for previous p_time:
create view time_split_extended as
select c.p_time, max( p.p_time) as previous_ptime
from
time_split c
inner join
time_split p
on p.p_time < c.p_time
group by c.p_time
create view peak_by_time_and_previous as
select
te.p_time,
te.previous_ptime,
pc.peak as peak,
pp.peak as previous_peak
from
time_split_extended te
inner join
peak_by_time pc on te.p_time = pc.p_time
inner join
peak_by_time pp on te.previous_ptime = pp.p_time
Now check that previous slot and current one have a max_peak:
select
cast(p_time as date) as p_day,
min( p_time ) as slot_from,
max( p_time) as slot_to,
peak
from
peak_by_time_and_previous p
inner join
max_peak_by_day m
on cast(p.p_time as date) = m.p_day and
p.peak = m.max_peak
where
p.peak = p.previous_peak
group by cast(p_time as date)
Disclaimer :
Also, I suggest to you that create temporary tables and materialize each view of this answer. This will improve performance and also you can know how many time takes each step.
This is essentially an implementation of the answer given by Max above. For simplicity I'll represent each listening episode as a start time and length as integer values (they could be changed to actual datetimes, and then the queries would need to be modified to use date arithmetic.)
> select * from episodes;
+--------+------+
| start | len |
+--------+------+
| 50621 | 480 |
| 24145 | 546 |
| 93943 | 361 |
| 67668 | 622 |
| 64681 | 328 |
| 110786 | 411 |
...
The following query combines the start and end times with a UNION
, flagging end times to distinguish from start times, and keeping a running accumulator of the number of listeners:
SET @idx=0;
SET @n=0;
SELECT (@idx := @idx + 1) as idx,
t,
(@n := @n + delta) as n
FROM
(SELECT start AS t,
1 AS delta
FROM episodes
UNION ALL
SELECT start + len AS t,
-1 AS delta FROM episodes
ORDER BY t) stage
+------+--------+------+
| idx | t | n |
+------+--------+------+
| 1 | 8 | 1 |
| 2 | 106 | 2 |
| 3 | 203 | 3 |
| 4 | 274 | 2 |
| 5 | 533 | 3 |
| 6 | 586 | 2 |
...
where 't' is the start of each interval (it's a new "interval" whenever the number of listeners, "n", changes). In a version where "t" is an actual datetime, you could easily group by day to obtain a peak episode for each day, or other such summaries. To get the end time of each interval - you could take the table above and join it to itself on right.idx = left.idx + 1 (ie join each row with the succeeding one).
SELECT
COUNT(*) AS listeners,
current.time_start, AS peak_start,
MIN(overlap.time_end) AS peak_end
FROM
yourTable AS current
INNER JOIN
yourTable AS overlap
ON overlap.time_start <= current.time_start
AND overlap.time_end > current.time_start
GROUP BY
current.time_start,
current.time_end
HAVING
MIN(overlap.time_end) < COALESCE((SELECT MIN(time_start) FROM yourTable WHERE timeStart > current.timeStart), current.time_end+1)
For each record, join on everything that overlaps.
The MIN() of the overlapping records' time_end is when the first current listener stops listening.
If that time is less than next occurance of a time_start, it's a peak. (Peak = start immediately followed by a stop)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.