I have two dataframes, which both refer to the same events (marked by id
). One df is discrete and shows the course of the event in a certain resolution over a few months (df1 shows only an excerpt), the other one summarizes the parameters for each event (df_event).
Simplified data: df (the original df has much more rows!)
df = pd.DataFrame({'id':[1,1,1,2,2,2,2],
'date':['2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-01 12:00:00','2020-01-05 15:00:00','2020-01-05 15:00:00',
'2020-01-05 15:00:00','2020-01-05 15:00:00'],
'numb':[1,5,8,0,4,11,25]},
index=pd.date_range(start = "2020-01-01 12:00", periods = 7, freq = '1H'))
df['date'] = pd.to_datetime(df['date'])
Output:
id date numb
2020-01-01 12:00:00 1 2020-01-01 12:00:00 1
2020-01-01 13:00:00 1 2020-01-01 12:00:00 5
2020-01-01 14:00:00 1 2020-01-01 12:00:00 8
2020-01-01 15:00:00 2 2020-01-05 15:00:00 0
2020-01-01 16:00:00 2 2020-01-05 15:00:00 4
2020-01-01 17:00:00 2 2020-01-05 15:00:00 11
2020-01-01 18:00:00 2 2020-01-05 15:00:00 25
df_event:
df_event = pd.DataFrame({'id':[1,2,3,4,5],
'date':['2020-01-01 12:00:00','2020-01-01 15:00:00','2020-01-08 07:00:00','2020-01-15 13:00:00','2020-01-22 12:00:00'],
'numb_total':[8,25,11,14,8],
'timedelta': [55,60,45,15,30]})
df_event = df_event.set_index('id')
df_event['date'] = pd.to_datetime(df_event['date'])
df_event['timedelta'] = pd.to_timedelta(df_event['timedelta'], unit='T')
Output:
date numb_total timedelta
id
1 2020-01-01 12:00:00 8 00:55:00
2 2020-01-01 15:00:00 25 01:00:00
3 2020-01-08 07:00:00 11 00:45:00
4 2020-01-15 13:00:00 14 00:15:00
5 2020-01-22 12:00:00 8 00:30:00
now I want to link the two dfs together so that I get a day/week profile. The df should be sorted by hours/days. The average values for numb
and timedelta
for the time period should then appear here.
The week profile should show which numb
and timedelta
(from df_event) is the average for the respective moment = day + time
(interesting would also be the minimum and maximum value at any moment).
For example df_week
create a new df2 like:
df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time
df_event = df.groupby(['day', 'time'])...
and than add the data of `df_event, to get sometihing like this:
timedelta numb_total
day time
Monday 00:00:00 00:00:00 0
Monday 01:00:00 00:00:00 0
...
Wednesday 11:00:00 00:00:00 0
Wednesday 12:00:00 00:55:00 8
...
Sunday 14:00:00 00:00:00 0
Sunday 15:00:00 01:00:00 25
Sunday 16:00:00 00:00:00 0
...
Sunday 23:00:00 00:00:00 0
#What is the relationship between the index and date in df? All of them are dates. Which has a relationship with df_event date?
Happy to review after you clarify.
#Generate column key in each datframe extracting hour. Merge the two dataframes on key. Drop columns not required
df2=pd.merge(df.assign(key=df.index.hour),df_event.assign(key=df_event.set_index('date')\
.index.hour),on=['key','date'],how='right').dropna().drop_duplicates(keep='last')[['date','numb_total','timedelta']]
#Extract time and day_name
df2['time']=df2.date.dt.strftime('%H:%M:%S')
df2['day']=df2.date.dt.day_name()
date n umb_total timedelta time day
0 2020-01-01 12:00:00 8 00:55:00 12:00:00 Wednesday
IIUC first aggregate both DataFrame
s and then merge together:
df_event = df_event.set_index('id')
df_event['date'] = pd.to_datetime(df_event['date'])
df_event['day'] = df_event['date'].dt.day_name()
df_event['time'] = df_event['date'].dt.time
df_event1 = df_event.groupby(['day', 'time'])[['timedelta', 'numb_total']].mean()
print (df_event1)
timedelta numb_total
day time
Wednesday 07:00:00 45.0 11.0
12:00:00 42.5 8.0
13:00:00 15.0 14.0
15:00:00 60.0 25.0
df['day'] = df['date'].dt.day_name()
df['time'] = df['date'].dt.time
df_event2 = df.groupby(['day', 'time'])['numb'].mean()
print (df_event2)
day time
Sunday 15:00:00 10.000000
Wednesday 12:00:00 4.666667
Name: numb, dtype: float64
df = df_event1.join(df_event2, how='inner' )
df['timedelta'] = pd.to_timedelta(df['timedelta'], unit='T')
print (df)
timedelta numb_total numb
day time
Wednesday 12:00:00 0 days 00:42:30 8.0 4.666667
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.