简体   繁体   中英

Pandas: Connecting information between two dataframes

I have two dataframes. The dataframe A contains the information about a trip:

Id  Name        StartTime           EndTime
0   201 Car1    2016-01-01 00:00:00 2016-01-01 00:43:05
1   205 Car2    2016-01-01 00:10:00 2016-01-01 00:45:05
2   345 Car3    2016-01-01 00:01:00 2016-01-01 00:47:05
3   456 Car2    2016-01-02 00:00:00 2016-01-02 02:45:05
4   432 Car1    2016-01-02 00:00:00 2016-01-02 02:47:05

The dataframe B contains timestamps during the trip (like a gps).

    Name    Timestamp
0   Car1    2016-01-01 00:05:00
1   Car1    2016-01-01 00:05:24
2   Car2    2016-01-01 00:10:04
3   Car3    2016-01-01 00:01:04
4   Car2    2016-01-01 00:10:34
5   Car1    2016-01-01 00:05:54

I need to add a column to Dataframe B called Id which picks up the Id from dataframe A based on the name and the start and end-times in frame A . Both these dataframes are really large, so I need an efficient way of doing this.

I think you need merge with outer join on column Name , then filter by boolean indexing and last remove columns by drop :

df = pd.merge(df1, df2, on='Name', how='outer')
df = df[(df.StartTime <= df.Timestamp) & (df.EndTime >= df.Timestamp)]
df = df.drop(['StartTime','EndTime'], axis=1)
print (df)
     Id  Name           Timestamp
0   201  Car1 2016-01-01 00:05:00
1   201  Car1 2016-01-01 00:05:24
2   201  Car1 2016-01-01 00:05:54
6   205  Car2 2016-01-01 00:10:04
7   205  Car2 2016-01-01 00:10:34
10  345  Car3 2016-01-01 00:01:04

This appears to be a situation for the recently added merge_asof . That takes the left dataframe (dataframe B here) and does the following:

For each row in the left DataFrame, we select the last row in the right DataFrame whose 'on' key is less than or equal to the left's key. Both DataFrames must be sorted by the key.

make sure we have timestamp data

dfa['StartTime'] = pd.to_datetime(dfa.StartTime)
dfa['EndTime'] = pd.to_datetime(dfa.EndTime)
dfb['Timestamp'] = pd.to_datetime(dfb.Timestamp)

Then sort joining columns

dfb = dfb.sort_values('Timestamp')
dfa = dfa.sort_values('StartTime')

Perfom asof merge by 'Name'

pd.merge_asof(dfb, dfa, left_on='Timestamp', right_on='StartTime', by='Name')

Output

   Name           Timestamp   Id           StartTime             EndTime
0  Car3 2016-01-01 00:01:04  345 2016-01-01 00:01:00 2016-01-01 00:47:05
1  Car1 2016-01-01 00:05:00  201 2016-01-01 00:00:00 2016-01-01 00:43:05
2  Car1 2016-01-01 00:05:24  201 2016-01-01 00:00:00 2016-01-01 00:43:05
3  Car1 2016-01-01 00:05:54  201 2016-01-01 00:00:00 2016-01-01 00:43:05
4  Car2 2016-01-01 00:10:04  205 2016-01-01 00:10:00 2016-01-01 00:45:05
5  Car2 2016-01-01 00:10:34  205 2016-01-01 00:10:00 2016-01-01 00:45:05

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM