I have got two dataframes (ie df1 and df2).
df1 contains date and time columns. Time columns contains 30 minutes interval of time series:
df1:
date time
0 2015-04-01 00:00:00
1 2015-04-01 00:30:00
2 2015-04-01 01:00:00
3 2015-04-01 01:30:00
4 2015-04-01 02:00:00
df2 contains date, start-time, end-time, value:
df2
INCIDENT_DATE INTERRUPTION_TIME RESTORE_TIME WASTED_MINUTES
0 2015-04-01 00:32 01:15 1056.0
1 2015-04-01 01:20 02:30 3234.0
2 2015-04-01 01:22 03:30 3712.0
3 2015-04-01 01:30 03:15 3045.0
Now I want to copy the wasted_minutes column from df2 to df1 when date columns of both data frames are the same and Interruption_time of the column of df2 lies in the time column of df1. So the output should look like:
df1:
date time Wasted_columns
0 2015-04-01 00:00:00 NaN
1 2015-04-01 00:30:00 1056.0
2 2015-04-01 01:00:00 6946.0
3 2015-04-01 01:30:00 3045.0
4 2015-04-01 02:00:00 NaN
I tried merge command (on the basis of date column), but didn't produce the desired result, because I am not sure how to check whether time falls in 30 minutes intervals or not? Could anyone guide how to fix the issue?
You can do this
df1['time']=pd.to_datetime(df1['time'])
df1['Wasted_columns']=df1.apply(lambda x: df2.loc[(pd.to_datetime(df2['INTERRUPTION_TIME'])>= x['time']) & (pd.to_datetime(df2['INTERRUPTION_TIME'])< x['time']+pd.Timedelta(minutes=30)),'WASTED_MINUTES'].sum(), axis=1)
df1['time']=df1['time'].dt.time
If you convert the 'time' column in the lambda function itself, then it is just one line of code as below
df1['Wasted_columns']=df1.apply(lambda x: df2.loc[(pd.to_datetime(df2['INTERRUPTION_TIME'])>= pd.to_datetime(x['time'])) & (pd.to_datetime(df2['INTERRUPTION_TIME'])< pd.to_datetime(x['time'])+pd.Timedelta(minutes=30)),'WASTED_MINUTES'].sum(), axis=1)
Output
date time Wasted_columns
0 2015-04-01 00:00:00 0.0
1 2015-04-01 00:30:00 1056.0
2 2015-04-01 01:00:00 6946.0
3 2015-04-01 01:30:00 3045.0
4 2015-04-01 02:00:00 0.0
Convert time
to timedelta and assign back to df1
. Convert INTERRUPTION_TIME
to timedelta and floor
it to 30-minute interval and assign to s
. Groupby df2
by INCIDENT_DATE
, s
and call sum
of WASTED_MINUTES
. Finally, join
the result of groupby
back to df1
df1['time'] = pd.to_timedelta(df1['time'].astype(str)) #cast to str before calling `to_timedelta`
s = pd.to_timedelta(df2.INTERRUPTION_TIME+':00').dt.floor('30Min')
df_final = df1.join(df2.groupby(['INCIDENT_DATE', s]).WASTED_MINUTES.sum(),
on=['date', 'time'])
Out[631]:
date time WASTED_MINUTES
0 2015-04-01 00:00:00 NaN
1 2015-04-01 00:30:00 1056.0
2 2015-04-01 01:00:00 6946.0
3 2015-04-01 01:30:00 3045.0
4 2015-04-01 02:00:00 NaN
The idea: + Convert to datetime + Round to nearest 30 mins + Merge
from datetime import datetime, timedelta
def ceil_dt(dt, delta):
return dt + (datetime.min - dt) % delta
# Convert
df1['dt'] = (df1['date'] + ' ' + df1['time']).apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
df2['dt'] = (df2['INCIDENT_DATE '] + ' ' + df2['INTERRUPTION_TIME']).apply(datetime.strptime, args=['%Y-%m-%d %H:%M'])
# Round
def ceil_dt(dt, delta):
return dt + (datetime.min - dt) % delta
df2['dt'] = df2['dt'].apply(ceil_dt, args=[timedelta(minutes=30)])
# Merge
final = df1.merge(df2.loc[:, ['dt', 'wasted_column'], on='dt', how='left'])
Also if multiple incidents happens in 30 mins timeframe, you would want to group by on df2 with rounded dt col first to sum up wasted then merge
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.