简体   繁体   中英

Copying column from one data frame to another based on matching of combination of two columns

I have got two dataframes (ie df1 and df2).

df1 contains date and time columns. Time columns contains 30 minutes interval of time series:

df1:
         date      time
0       2015-04-01  00:00:00
1       2015-04-01  00:30:00
2       2015-04-01  01:00:00
3       2015-04-01  01:30:00
4       2015-04-01  02:00:00

df2 contains date, start-time, end-time, value:

df2
       INCIDENT_DATE INTERRUPTION_TIME RESTORE_TIME  WASTED_MINUTES
0        2015-04-01             00:32        01:15          1056.0
1        2015-04-01             01:20        02:30          3234.0
2        2015-04-01             01:22        03:30          3712.0
3        2015-04-01             01:30        03:15          3045.0

Now I want to copy the wasted_minutes column from df2 to df1 when date columns of both data frames are the same and Interruption_time of the column of df2 lies in the time column of df1. So the output should look like:

df1:
                date      time      Wasted_columns
    0       2015-04-01  00:00:00       NaN
    1       2015-04-01  00:30:00       1056.0
    2       2015-04-01  01:00:00       6946.0
    3       2015-04-01  01:30:00       3045.0
    4       2015-04-01  02:00:00       NaN

I tried merge command (on the basis of date column), but didn't produce the desired result, because I am not sure how to check whether time falls in 30 minutes intervals or not? Could anyone guide how to fix the issue?

You can do this

df1['time']=pd.to_datetime(df1['time'])
df1['Wasted_columns']=df1.apply(lambda x: df2.loc[(pd.to_datetime(df2['INTERRUPTION_TIME'])>= x['time']) & (pd.to_datetime(df2['INTERRUPTION_TIME'])< x['time']+pd.Timedelta(minutes=30)),'WASTED_MINUTES'].sum(), axis=1)
df1['time']=df1['time'].dt.time

If you convert the 'time' column in the lambda function itself, then it is just one line of code as below

df1['Wasted_columns']=df1.apply(lambda x: df2.loc[(pd.to_datetime(df2['INTERRUPTION_TIME'])>= pd.to_datetime(x['time'])) & (pd.to_datetime(df2['INTERRUPTION_TIME'])< pd.to_datetime(x['time'])+pd.Timedelta(minutes=30)),'WASTED_MINUTES'].sum(), axis=1)

Output

          date     time     Wasted_columns
0   2015-04-01  00:00:00    0.0
1   2015-04-01  00:30:00    1056.0
2   2015-04-01  01:00:00    6946.0
3   2015-04-01  01:30:00    3045.0
4   2015-04-01  02:00:00    0.0

Convert time to timedelta and assign back to df1 . Convert INTERRUPTION_TIME to timedelta and floor it to 30-minute interval and assign to s . Groupby df2 by INCIDENT_DATE , s and call sum of WASTED_MINUTES . Finally, join the result of groupby back to df1

df1['time'] = pd.to_timedelta(df1['time'].astype(str)) #cast to str before calling `to_timedelta`
s = pd.to_timedelta(df2.INTERRUPTION_TIME+':00').dt.floor('30Min')
df_final = df1.join(df2.groupby(['INCIDENT_DATE', s]).WASTED_MINUTES.sum(), 
                    on=['date', 'time'])

Out[631]:
         date     time  WASTED_MINUTES
0  2015-04-01 00:00:00             NaN
1  2015-04-01 00:30:00          1056.0
2  2015-04-01 01:00:00          6946.0
3  2015-04-01 01:30:00          3045.0
4  2015-04-01 02:00:00             NaN

The idea: + Convert to datetime + Round to nearest 30 mins + Merge

from datetime import datetime, timedelta

def ceil_dt(dt, delta):
    return dt + (datetime.min - dt) % delta

# Convert
df1['dt'] = (df1['date'] + ' ' + df1['time']).apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
df2['dt'] = (df2['INCIDENT_DATE '] + ' ' + df2['INTERRUPTION_TIME']).apply(datetime.strptime, args=['%Y-%m-%d %H:%M'])

# Round
def ceil_dt(dt, delta):
    return dt + (datetime.min - dt) % delta

df2['dt'] = df2['dt'].apply(ceil_dt, args=[timedelta(minutes=30)])

# Merge
final = df1.merge(df2.loc[:, ['dt', 'wasted_column'], on='dt', how='left'])

Also if multiple incidents happens in 30 mins timeframe, you would want to group by on df2 with rounded dt col first to sum up wasted then merge

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM