I am trying to solve a problem using two dataframes: 1 - Grid with TV data, it has the beginning and end (time) of the show and the channel name; 2 - Viewers data - It has the beginning and end (time) of the tune, the channel that it was tunned to and the user ID;
How can I join both tables and add new rows when there is overlap on the dates for different users? Kind of like the example below:
Dataframe 1:
Channel | In_Hour | Out_Hour |
---|---|---|
Channel_1 | 8:00 | 22:00 |
Channel_2 | 22:00 | 22:01 |
Channel_3 | 22:01 | 22:40 |
Dataframe 2:
Channel | Program | Start | End |
---|---|---|---|
Channel_1 | a | 07:00 | 09:00 |
Channel_1 | b | 09:00 | 12:40 |
Channel_1 | c | 12:00 | 23:00 |
Channel_1 | d | 23:00 | 23:30 |
Channel_1 | e | 23:30 | 23:45 |
Channel_2 | f | 21:00 | 23:40 |
Channel_3 | g | 21:40 | 23:00 |
Objective Dataframe:
Channel | Program | Start | End |
---|---|---|---|
Channel_1 | a | 08:00 | 09:00 |
Channel_1 | b | 09:00 | 12:00 |
Channel_1 | c | 12:00 | 22:00 |
Channel_2 | f | 22:00 | 22:01 |
Channel_3 | g | 22:01 | 22:40 |
Setup:
import pandas as pd
df1 = pd.DataFrame({
'Channel': {0: 'Channel_1', 1: 'Channel_2', 2: 'Channel_3'},
'In_Hour': {0: '8:00', 1: '22:00', 2: '22:01'},
'Out_Hour': {0: '22:00', 1: '22:01', 2: '22:40'}
})
df1['In_Hour'] = pd.to_datetime(df1['In_Hour'])
df1['Out_Hour'] = pd.to_datetime(df1['Out_Hour'])
df2 = pd.DataFrame({
'Channel': {0: 'Channel_1', 1: 'Channel_1', 2: 'Channel_1', 3: 'Channel_1',
4: 'Channel_1', 5: 'Channel_2', 6: 'Channel_3'},
'Program': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g'},
'Start': {0: '07:00', 1: '09:00', 2: '12:00', 3: '23:00', 4: '23:30',
5: '21:00', 6: '21:40'},
'End': {0: '09:00', 1: '12:40', 2: '23:00', 3: '23:30', 4: '23:45',
5: '23:40', 6: '23:00'}
})
df2['Start'] = pd.to_datetime(df2['Start'])
df2['End'] = pd.to_datetime(df2['End'])
Try merging
the frames together, use a mask to filter out rows that don't fall within criteria, use apply
+ clip
to ensure that every row falls within the start and end time specified in In_Hour
and Out_Hour
.
# Merge Frames Together
df3 = df2.merge(df1, on='Channel')
# Start is before Out_Hour and End is after In_Hour
m1 = df3['Start'].lt(df3['Out_Hour']) & df3['End'].gt(df3['In_Hour'])
# Filter To Only Keep Rows that are within times
df3 = df3[m1].reset_index(drop=True)
df3 = df3[['Channel', 'Program']].join(
# Groupby Channel
df3.apply(
# Clip lower and upper bounds based on In_Hour and Out_Hour
lambda r: r[['Start', 'End']].clip(
lower=r['In_Hour'], upper=r['Out_Hour']
),
axis=1
)
)
# Fix Hour Formatting
df3['Start'] = df3['Start'].dt.strftime('%H:%M')
df3['End'] = df3['End'].dt.strftime('%H:%M')
df3
:
Channel Program Start End
0 Channel_1 a 08:00 09:00
1 Channel_1 b 09:00 12:40
2 Channel_1 c 12:00 22:00
3 Channel_2 f 22:00 22:01
4 Channel_3 g 22:01 22:40
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.