简体   繁体   中英

PHP/MYSQL datetime ranges overlapping for users

please I need help with this (for better understanding please see attached image) because I am completely helpless.

http://img16.imageshack.us/img16/7196/overlapsen.jpg

As you can see I have users and they store their starting and ending datetimes in my DB as YYYY-mm-dd H:i:s. Now I need to find out overlaps for all users according to the most frequent time range overlaps (for most users). I would like to get 3 most frequented datatime overlaps for most users. How can I do it?

I have no idea which mysql query should I use or maybe it would be better to select all datetimes (start and end) from database and process it in php (but how?). As stated on image results should be for example time 8.30 - 10.00 is result for users A+B+C+D.

Table structure:
UserID | Start datetime | End datetime
--------------------------------------
A | 2012-04-03 4:00:00 | 2012-04-03 10:00:00
A | 2012-04-03 16:00:00 | 2012-04-03 20:00:00
B | 2012-04-03 8:30:00 | 2012-04-03 14:00:00
B | 2012-04-06 21:30:00 | 2012-04-06 23:00:00
C | 2012-04-03 12:00:00 | 2012-04-03 13:00:00
D | 2012-04-01 01:00:01 | 2012-04-05 12:00:59
E | 2012-04-03 8:30:00 | 2012-04-03 11:00:00
E | 2012-04-03 21:00:00 | 2012-04-03 23:00:00

What you effectively have is a collection of sets and want to determine if any of them have non-zero intersections. This is the exact question one asks when trying to find all the ancestors of a node in a nested set.

We can prove that for every overlap, at least one time window will have a start time that falls within all other overlapping time windows. Using this tidbit, we don't need to actually construct artificial timeslots in the day. Simply take a start time and see if it intersects any of the other time windows and then just count up the number of intersections.

So what's the query?

/*SELECT*/
SELECT DISTINCT
    MAX(overlapping_windows.start_time) AS overlap_start_time,
    MIN(overlapping_windows.end_time) AS overlap_end_time ,
    (COUNT(overlapping_windows.id) - 1) AS num_overlaps
FROM user_times AS windows
INNER JOIN user_times AS overlapping_windows
ON windows.start_time BETWEEN overlapping_windows.start_time AND overlapping_windows.end_time
GROUP BY windows.id
ORDER BY num_overlaps DESC;

Depending on your table size and how often you plan on running this query, it might be worthwhile to drop a spatial index on it (see below).

UPDATE

If your running this query often, you'll need to use a spatial index. Because of range based traversal (ie. does start_time fall in between the range of start/end), a BTREE index will not do anything for you. IT HAS TO BE SPATIAL.

ALTER TABLE user_times ADD COLUMN time_windows GEOMETRY NOT NULL DEFAULT 0;
UPDATE user_times SET time_windows = GeomFromText(CONCAT('LineString( -1 ', start_time, ', 1 ', end_time, ')'));
CREATE SPATIAL INDEX time_window ON user_times (time_window);

Then you can update the ON clause in the above query to read

ON MBRWithin( Point(0,windows.start_time), overlapping_windows.time_window )

This will get you an indexed traversal for the query. Again only do this if your planning on running the query often.

Credit for the spatial index to Quassoni's blog .

I would not do much in SQL, this is so much simpler in a programming language, SQL is not made for something like this.

Of course, it's just sensible to break the day down into "timeslots" - this is statistics. But as soon as you start handling dates over the 00:00 border, things start to get icky when you use joins and inner selects. Especially with MySQL which does not quite like inner selects.

Here's a possible SQL query

SELECT count(*) FROM `times`
WHERE
  ( DATEDIFF(`Start`,`End`) = 0 AND
    TIME(`Start`) < TIME('$SLOT_HIGH') AND
    TIME(`End`) > TIME('$SLOT_LOW'))
  OR
  ( DATEDIFF(`Start`,`End`) > 0 AND
    TIME(`Start`) < TIME('$SLOT_HIGH') OR
    TIME(`End`) > TIME('$SLOT_LOW')

Here's some pseudo code

granularity = 30*60; // 30 minutes
numslots = 24*60*60 / granularity;
stats = CreateArray(numslots);
for i=0, i < numslots, i++ do
  stats[i] = GetCountFromSQL(i*granularity, (i+1)*granularity); // low, high
end

Yes, that makes numslots queries, but no joins no nothing, hence it should be quite fast. Also you can easily change the resolution.

And another positive thing is, you could "ask yourself", "I have two possible timeslots, and I need the one where more people are here, which one should I use?" and just run the query twice with respective ranges and you are not stuck with predefined time slots.

To only find full overlaps (an entry only counts if it covers the full slot) you have to switch low and high ranges in the query.

You might have noticed that I do not add times between entries that could span multiple days, however, adding a whole day, will just increase all slots by one, making that quite useless. You could however add them by selecting sum(DAY(End) - DAY(Start)) and just add the return value to all slots.

Table seems pretty simple. I would keep your SQL query pretty simple:

SELECT * FROM tablename

Then when you have the info saved in your PHP object. Do the processing with PHP using loops and comparisons.

In simplest form:

for($x, $numrows = mysql_num_rows($query); $x < $numrows; $x++){

     /*Grab a row*/
     $row = mysql_fetch_assoc($query);

     /*store userID, START, END*/
     $userID = $row['userID'];
     $start = $row['START'];
     $end = $row['END'];

     /*Have an array for each user in which you store start and end times*/  

     if(!strcmp($userID, "A")
     {
        /*Store info in array_a*/
     }
     else if(!strcmp($userID, "B")
     {
        /*etc......*/
     } 
}
 /*Now you have an array for each user with their start/stop times*/

 /*Do your loops and comparisons to find common time slots. */

 /*Also, use strtotime() to switch date/time entries into comparable values*/

Of course this is in very basic form. You'll probably want to do one loop through the array to first get all of the userIDs before you compare them in the loop shown above.

Something like this should get you started -

SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
    SELECT CURRENT_DATE + INTERVAL ((id-1)*30) MINUTE AS time_slot
    FROM dummy
    WHERE id BETWEEN 1 AND 48
) AS slots
LEFT JOIN user_bookings
    ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC

The idea is to create a derived table that consists of time slots for the day. In this example I have used dummy (which can be any table with an AI id that is contiguous for the required set) to create a list of timeslots by adding 30mins incrementally. The result of this is then joined to bookings to be able to count the number of books for each time slot.

UPDATE For entire date/time range you could use a query like this to get the other data required -

SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
FROM user_bookings

These values can then be substituted into the original query or the two can be combined -

SELECT slots.time_slot, COUNT(*) AS num_users, GROUP_CONCAT(DISTINCT user_bookings.user_id ORDER BY user_bookings.user_id) AS user_list
FROM (
    SELECT DATE(tmp.min_start) + INTERVAL ((id-1)*30) MINUTE AS time_slot
    FROM dummy
    INNER JOIN (
        SELECT MIN(`start`) AS `min_start`, MAX(`end`) AS `max_end`, DATEDIFF(MAX(`end`), MIN(`start`)) + 1 AS `num_days`
        FROM user_bookings
    ) AS tmp
    WHERE dummy.id BETWEEN 1 AND (48 * tmp.num_days)
) AS slots
LEFT JOIN user_bookings
    ON slots.time_slot BETWEEN `user_bookings`.`start` AND `user_bookings`.`end`
GROUP BY slots.time_slot
ORDER BY num_users DESC

EDIT I have added DISTINCT and ORDER BY clauses in the GROUP_CONCAT() in response to your last query.

Please note that you will will need a much greater range of ids in the dummy table. I have not tested this query so it may have syntax errors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM