简体   繁体   中英

MySQL select time groupings where timestamps overlap from different rows with timezone difference

This question seems different to the others asked so I'll ask it here.

I have a MySQL table that stores from and to timestamps, I would like to select groupings from this table to work out groups of when people are "online" at the same time. The idea behind this madness is to automatically group people together in time slots that intersect. Ideally it would be great to get the best time for this group (but this may not be possible).

I have two tables, a table called "times" that store the times and a table called "users" that store users details, the users table also include a time difference field (in hours) that should be applied to times (all times are stored in UTC).

Here are my tables:

Users
userid | timediff
------------------
1      | 0
2      | 0
3      | 1
4      | 4
5      | -8
6      | 2
7      | 2

Times
userid | from                | to 
1      | 2015-01-13 16:00:00 | 2015-01-13 23:00:00
2      | 2015-01-13 13:00:00 | 2015-01-13 21:00:00
3      | 2015-01-13 14:00:00 | 2015-01-13 22:00:00
4      | 2015-01-13 11:00:00 | 2015-01-13 12:00:00
5      | 2015-01-13 10:00:00 | 2015-01-13 12:00:00
6      | 2015-01-13 11:00:00 | 2015-01-13 12:00:00
7      | 2015-01-13 09:00:00 | 2015-01-13 10:00:00   

In a perfect world this would group people like so:

1      | 2015-01-13 16:00:00 | 2015-01-13 23:00:00
2      | 2015-01-13 13:00:00 | 2015-01-13 21:00:00
3      | 2015-01-13 14:00:00 | 2015-01-13 22:00:00

these people are online together between 16:00 - 21:00

4      | 2015-01-13 11:00:00 | 2015-01-13 12:00:00
5      | 2015-01-13 10:00:00 | 2015-01-13 12:00:00
6      | 2015-01-13 11:00:00 | 2015-01-13 12:00:00

these people are online together between 11:00 - 12:00

(also please take into account this isn't taking into account the time difference for ease of understanding but I'm happy to figure that out if needed separately).

This may not be possible with just sql and I may need to use PHP, I haven't posted any sample code as I'm not sure the best direction to take, any pointers would be great!

This is not a super simple project. It has lots of pieces to it, specifically timezone offsets, time-range comparisons, and coincidence searching.

But let's give it a try. For starters, let's create a view to handle the timezone offset stuff. We really don't want to be mucking about with that computation over and over. This view will do it.

CREATE VIEW `utctimes` 
    AS select `t`.`userid` AS `userid`,
              `t`.`from` AS `from`,
              `t`.`to` AS `to`,
              `t`.`from` + interval `u`.`timediff` hour AS `utcfrom`,
              `t`.`to` + interval `u`.`timediff` hour AS `utcto`
         from `times` `t` 
         join `users` `u` on `u`.`userid` = `t`.`userid`;

Next, let's self-join this view and do some time-range comparisons to find out when more than one person is online. To see if a pair of from/to ranges overlap, this logic does it.

    a.from <= b.to
and b.from <= a.to

You can convince yourself that the two ranges overlap if both those conditions are true.

We'll assume both are online even if one comes on exactly at noon and the other goes off exactly at noon, even though that might be a poor assumption.

This query will give us a list of time-ranges and the number of users online sometime during each time-range. It does this with a promiscuous (and therefore somewhat expensive) self-join.

select count(*) as users_on, 
       greatest(a.utcfrom, b.utcfrom) utcfrom, 
       least(a.utcto, b.utcto) utcto
  from utctimes a
  join utctimes b on a.userid <> b.userid
 where a.utcfrom <= b.utcto
   and b.utcfrom <= a.utcto
 group by  greatest(a.utcfrom, b.utcfrom), least(a.utcto, b.utcto) 
 order by count(*) desc, 
          greatest(a.utcfrom, b.utcfrom),
          timestampdiff(minute, greatest(a.utcfrom, b.utcfrom), 
                       least(a.utcto, b.utcto)) desc

This will give the most popular range first, then some other ranges in order of popularity. It does yield some overlapping ranges.

Once you have the most popular time ranges, you can find out which users are online during those ranges. This JOIN, for example, will do that.

select r.users_on, r.utcfrom online_session_start, 
       timediff(r.utcto, r.utcfrom) online_session_duration,
       q.userid, q.`from`, q.`to`
  from utctimes q
  join (
    select count(*) as users_on, 
           greatest(a.utcfrom, b.utcfrom) utcfrom, 
           least(a.utcto, b.utcto) utcto
      from utctimes a
      join utctimes b on a.userid <> b.userid
     where a.utcfrom <= b.utcto
       and b.utcfrom <= a.utcto
     group by  greatest(a.utcfrom, b.utcfrom), least(a.utcto, b.utcto) 
        ) r on q.utcfrom <= r.utcto
           and r.utcfrom <= q.utcto
 order by 2,3,4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM