[英]Update column if date between 2 dates in another Python pandas dataframe
這是我的 2 個數據框:df1
eid start_dt end_dt flag
1 2020-12-01 2020-12-07 0
1 2020-12-08 2020-12-15 0
1 2020-12-16 2020-12-23 1
2 2020-12-01 2020-12-07 0
df2
eid event_dt col1 col2
1 2020-12-01 . .
1 2020-12-09 . .
1 2020-12-17 . .
2 2020-12-02 . .
output df。
- If in df1 and df2, the eids match AND event_dt is between start_dt,end_dt
-- add a new column
-- update the flag
output 數據幀 df 看起來像這樣
eid event_dt col1 col2 flag
1 2020-12-01 . . 0
1 2020-12-09 . . 0
1 2020-12-17 . . 1
2 2020-12-02 . . 0
我將如何 go 這樣做?
嘗試merge
和query
:
df2['flag'] = (df1.assign(idx=df1.index)
.merge(df2, on='eid', how='left')
.query('start_dt <= event_dt <= end_dt')
.set_index('idx')
['flag']
)
Output:
eid event_dt col1 col2 flag
0 1 2020-12-01 . . 0
1 1 2020-12-09 . . 0
2 1 2020-12-17 . . 1
3 2 2020-12-02 . . 0
更新:對於更大的數據集,上述方法可能會產生MemoryError
。 改用pd.merge_asof
:
df2['flag'] = (pd.merge_asof(df2.sort_values('event_dt'),
df1.assign(idx=df1.index).sort_values('end_dt'),
by='eid', left_on='event_dt',
right_on='start_dt')
.query('event_dt<=end_dt')
.set_index('idx')
['flag']
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.