Can the performance of this overlapping bookings query be improved?

Question

I maintain an online bookings system that occasionally contains ~~duplicate~~ overlapping bookings as a result of a bug(s) we are trying to locate. While we are doing so, I've been given a query to list the overlapping bookings for the past two months so we can manually address them.

My problem is that this query takes forever (5+ minutes) to run and the bookings system grinds to halt while it does so to the detriment of our users. So I'd like to improve its performance.

The relevant schema is pseudo-coded below. There are two key tables and their respective columns.

Bookings                        Accounts
ID : int                        ID : int
Status : bool                   Status : bool
StartTime : datetime            Name : varchar
EndTime : datetime
RoomID : int
MemberID : int
AccountID : int

PK: ID                          PK: ID
Index: StartTime, EndTime, 
       MemberID, AccountID,
       RoomID, Status

The keys are all simple keys (ie. no compound keys). Bookings.AccountID is a foreign key into Accounts.ID.

The query is roughly:

SELECT b1.AccountID, a.Name, b1.ID, b2.ID, b1.StartTime, b1.EndTime, b1.RoomID
FROM Bookings b1
LEFT JOIN Bookings b2
ON b1.MemberID = b2.MemberID
   AND b1.RoomID = b2.RoomID
   AND b2.StartTime > SUBDATE(NOW(), INTERVAL 2 MONTH)) 
LEFT JOIN Accounts a
ON b1.AccountId = a.ID 
WHERE b1.ID != b2.ID
AND b1.Status = 1
AND b2.Status = 1
AND b1.StartTime > SUBDATE(NOW(), INTERVAL 2 MONTH)) 
AND (
  (b1.StartTime >= b2.StartTime AND b2.EndTime <= b1.EndTime AND b1.StartTime < b2.EndTime) OR
  (b1.StartTime <= b2.StartTime AND b2.EndTime >= b1.EndTime AND b2.StartTime < b1.EndTime) OR
  (b2.StartTime <= b1.StartTime AND b2.EndTime >= b1.EndTime)
)

As far as I can tell, the query essentially joins the bookings table to itself (for the past two months) and attempts to eliminate distinct bookings. That is, it looks for valid (status=1) bookings belonging to the same member for the same room where the duration of the bookings overlap.

The last three clauses look for (a) a booking starting during the other and finishing after; (b) a booking starting before the other and finishing during; and (c) a booking wholly contained within the other. This appears to omit (for mine) a booking wholly around the other (although I'm not sure why).

The bookings table is very large (~2m rows) as it has years of bookings data in it. Can the performance of this query be improved (or replaced with a better one)? Any suggestions welcome.

Answer 1

I would rewrite the query like this

SELECT sub.*, a.Name, a.id
from (

    SELECT b1.AccountId, b1.ID, b2.ID, b1.StartTime, b1.EndTime, b1.RoomID
    FROM (select SUBDATE(NOW(), INTERVAL 2 MONTH) as subDate) const, Bookings b1
    LEFT JOIN Bookings b2
    ON b1.MemberID = b2.MemberID
       AND b1.RoomID = b2.RoomID
       AND b2.StartTime > const.subDate
       AND b1.ID != b2.ID 
       AND b2.Status = 1
    WHERE 
    b1.Status = 1
    AND b1.StartTime > const.subDate  
    AND (
      (b1.StartTime >= b2.StartTime AND b2.EndTime <= b1.EndTime AND b1.StartTime < b2.EndTime) OR
      (b1.StartTime <= b2.StartTime AND b2.EndTime >= b1.EndTime AND b2.StartTime < b1.EndTime) OR
      (b2.StartTime <= b1.StartTime AND b2.EndTime >= b1.EndTime)
    )

) sub
LEFT JOIN Accounts a ON 
  sub.AccountId = a.ID

UPDATE: Also check whether there are indexes for columns MemberID, RoomId, StartTime. If there are no such indexes introduce them

Answer 2

You didn't say whether this is like an e-commerce site for hotel/rental booking, or something like an intranet site for booking conference rooms, lecture halls, etc within an organization. I'm going to assume it's the former, since 5 minutes of downtime for that site would be significant, but for the latter, probably not as big of a deal.

So here's a heuristic you can use : It's unlikely (but not impossble) that a user would book the same room more than once within a two month period. If you select all the room IDs and user IDs within the timeframe, duplicate rows within the results could be a double-booking, or maybe just someone who goes on vacation a lot.

This is one way duplicate row detection could be done:

SELECT ID, StartTime, EndTime, RoomID, MemberID 
FROM Bookings WHERE ID NOT IN
( SELECT t.ID FROM
    (
        SELECT count(ID) as c, ID
        FROM Bookings
        GROUP BY RoomID, MemberID
    ) 
AS t WHERE t.c = 1 )

You could also use a stored procedure something like this (pseudocode-ish):

DECLARE id, rid, mid, old_rid, old_mid INT;
DECLARE cur CURSOR FOR SELECT ID, RoomID, MemberID FROM Bookings ORDER BY RoomID, MemberID;
old_rid, old_mid = 0;
LOOP
/* check for break condition here */
FETCH cur into id, rid, mid;
IF rid == old_rid AND mid == old_mid
INSERT INTO temp_table VALUES (id);
END IF;
SET old_rid = rid;
SET old_mid = mid;
END LOOP;

Then you'd run a query like your original one with StartTime/EndTime comparison on the result.

Answer 3

Essentially you were searching for all unique bookings. It is way faster to search for all the duplicates since that list should be shorter:

DROP TABLE IF EXISTS duplicate_bookings;

CREATE TEMPORARY TABLE duplicate_bookings AS SELECT MAX(b1.ID) as last_bookings_id, b1.AccountID, b1.StartTime, b1.EndTime, b1.RoomID
FROM Bookings b1 
GROUP BY b1.AccountID, b1.StartTime, b1.EndTime, b1.RoomID
HAVING COUNT(*)>1;

This query selects all booking which are duplicates and (my) assumption is you want to delete the last booking (MAX(b1.ID))

Delete the booking by:

DELETE FROM bookings WHERE id IN (SELECT last_bookings_id FROM duplicate_bookings);

Benefit: You can repeat this is a loop (execute all SQL in a single database session including the drop of the table duplicate_bookings) if you have triplicates, quadruples, etc.

To prevent new duplicates and find your bug real quick, and assuming you are using innodb: Add a unique index on:

CREATE UNIQUE INDEX idx_nn_1 ON Bookings(AccountID, StartTime, EndTime,RoomID);

YOu can only add this index after removing your duplicates. New duplicate inserts will fail from that point on.

Also a temporary index which might help in your deletion would be the non-unique index:

CREATE INDEX idx_nn_2 ON Bookings(AccountID, StartTime, EndTime,RoomID);

Answer 4

This compound index

INDEX(MemberID, RoomID, StartTime)

should speed up the first JOIN.

This should speed up the SELECT:

INDEX(Status, StartTime)

(No, it is not the same to have individual INDEXes on the fields.)

For overlapping time ranges, consider this compact form:

WHERE a.start < b.end AND a.end > b.start

What is the meaning of Status = 1 ? What percentage of the table has 1 ?

Can the performance of this overlapping bookings query be improved?

Question

4 answers

solution1
0 2015-04-10 07:21:30

solution2
0 2015-04-10 18:32:19

solution3
0 2015-04-10 18:49:26

solution4
0 2015-04-10 23:46:48

Can the performance of this overlapping bookings query be improved?

Question

4 answers

solution1 0 2015-04-10 07:21:30

solution2 0 2015-04-10 18:32:19

solution3 0 2015-04-10 18:49:26

solution4 0 2015-04-10 23:46:48

solution1
0 2015-04-10 07:21:30

solution2
0 2015-04-10 18:32:19

solution3
0 2015-04-10 18:49:26

solution4
0 2015-04-10 23:46:48