简体   繁体   中英

Filling dataframe column with calculated value from a different dataframe

I have two dataframes. The first contains a series of ID numbers with two dates, and the second contains a series of ID numbers with a date for an event.

import pandas as pd
df1 = pd.DataFrame({'id': ['19403', '59208', '45290'], 'date_1': ['2021-01-04', '2021-01-05', '2021-05-08'], 'date_2': ['2021-03-07', '2021-01-12', '2021-06-15']})
df2 = pd.DataFrame({'id': ['19403', '19403', '19403', '59208', '59208', '45290', '45290', '45290', '45290', '45290', '45290', '45290'], 'date': ['2021-01-05', '2021-02-12', '2021-03-17', '2021-01-06', '2021-01-08', '2021-03-14', '2021-03-17', '2021-05-09', '2021-05-12', '2021-05-23', '2021-06-01', '2021-08-14' ]})

I'm trying to merge the two dataframes so that for every row in the first dataframe, it counts the number of events in the second where the date is between the two dates in the first and the ID number matches.

output = pd.DataFrame({'id': ['19403', '59208', '45290'], 'date_1': ['2021-01-04', '2021-01-05', '2021-05-08'], 'date_2': ['2021-03-07', '2021-01-12', '2021-06-15'], 'count': [2, 2, 4]})

I've been able to mock it up one row at a time (below), but I'm struggling to apply it to the whole dataframe.

count = df2[(df2['id'] == '19403') & (df2['date'].between('2021-01-04', '2021-03-07'))]['date'].count()

Any suggestions? I'm probably missing something obvious here.

(df1.merge(df2, on='id', how='outer')
    .query("(date_1 <= date <= date_2)")
    .groupby(['id', 'date_1', 'date_2'], as_index=False)
    .size())

Output:

    id      date_1      date_2    size
0   19403   2021-01-04  2021-03-07  2
1   45290   2021-05-08  2021-06-15  4
2   59208   2021-01-05  2021-01-12  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM