简体   繁体   中英

What's the approach to joining these 2 tables?

Say I have 2 tables A and B which contain information for start and end times respectively. The primary key is a combination of id and the timestamp. Thus, no 2 records can have the same id and timestamp

A

id | start time
1 | 2016-02-06 17:03
1 | 2016-03-09 18:09
2 | 2017-02-07 23:34
3 | 2016-02-07 19:12
3 | 2016-02-07 23:52
...

B

id | end time
1 | 2016-02-06 18:32
1 | 2016-03-09 21:11
2 | 2017-02-08 01:22
3 | 2016-02-07 21:32
3 | 2016-02-08 02:11
...

My end result should be something like

id | start time | end time
1 | 2016-02-06 17:03 | 2016-02-06 18:32
1 | 2016-03-09 18:09 | 2016-03-09 21:11
2 | 2017-02-07 23:34 | 2017-02-08 01:22
3 | 2016-02-07 19:12 | 2016-02-07 21:32
3 | 2016-02-07 23:52 | 2016-02-08 02:11
...

Obviously I can't join on just ID as the ids 1 and 3 each appear twice. I can't join on the day either as the 3rd and 5th records span across 2 different days. So is there a way to join these 2 tables? Any help would be much appreciated! Thanks!

I agree with Barmar and encourage you to revisit your data model. I would expect start time and end time to be in the same table.

And while the existing ID may be for something like user_id, if that ID is duplicated in this table then there should be some other unique identifier, maybe transaction_id, that uniquely identifies each record.

Since the id's are the same and the end date is higher than the start date.

If those times are strings then use STR_TO_DATE

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b 
  ON b.id = a.id
 AND STR_TO_DATE(b.`end time`, '%Y-%m-%d %H:%i') > STR_TO_DATE(a.`start time`, '%Y-%m-%d %H:%i')
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

If those are timestamps

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time`
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

A test on rextester here

If there are many timestamps per B.id?
Then it might be more performant if the range is limited to a day or less.

SELECT a.id, a.`start time`, MIN(b.`end time`) AS `end time`
FROM A a
LEFT JOIN B b
  ON b.id = a.id
 AND b.`end time` > a.`start time` 
 AND b.`end time` < TIMESTAMPADD(HOUR,24,a.`start time`)
GROUP BY a.id, a.`start time`
ORDER BY a.id, a.`start time`;

Assuming that there are no overlaps between start/end times of the same id , you could join the tables, with a join condition based on a correlated subquery that ensures that the record of tableb that has the closest end_time after the current start_time of tablea is picked:

select
    a.*,
    b.end_time
from
    tablea a
    inner join tableb b
        on  b.id = a.id
        and b.end_time = (
            select min(b1.end_time)
            from tableb b1 
            where b1.id = a.id and b1.end_time > a.start_time
        )

Demo on DB Fiddle :

id | start_time       | end_time        
-: | :--------------- | :---------------
 1 | 2016-02-06 17:03 | 2016-02-06 18:32
 1 | 2016-03-09 18:09 | 2016-03-09 21:11
 2 | 2017-02-07 23:34 | 2017-02-08 01:22
 3 | 2016-02-07 19:12 | 2016-02-07 21:32
 3 | 2016-02-07 23:52 | 2016-02-08 02:11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM