[英]Pandas vectorization to assign column value based on date, given another dataframe with value and start date
In Pandas, I have a dataframe df1
with stocks investments and a start date:在 Pandas 中,我有一个包含股票投资和开始日期的数据df1
:
Stock,StartDate,Investment
A,2022-01-01,100
A,2022-02-01,150
B,2022-01-01,90
B,2022-01-15,100
...
Then I have a df2
:然后我有一个df2
:
Stock,Date
A,2022-01-01
A,2022-01-02
A,2022-01-05
...
B,2022-01-01
...
I want to add a column Investment
to df2
filled with investment taken from df1
: given a date d
and a stock S
in df2
, I want to assign the investment in df1
such date d
>= StartDate
and d < next start date
.我想在df2
中添加一个列Investment
,其中填充了从df1
获取的投资:给定日期d
和df2
中的股票S
,我想将df1
中的投资分配给这样的日期d
>= StartDate
和d < next start date
。
Expected output ( df2
) in this case is:在这种情况下,预期输出( df2
)是:
Stock,Date,Investment
A,2022-01-01,100
A,2022-01-02,100
A,2022-01-05,100
...
A,2022-01-31,100
A,2022-02-01,150
A,2022-02-02,150
...
B,2022-01-01,90
B,2022-01-02,90
...
B,2022-01-14,90
B,2022-01-15,100
B,2022-01-16,100
...
This can clearly be done with a loop, but I was looking for a more efficient approach, possibly using vectorization.这显然可以通过循环来完成,但我一直在寻找一种更有效的方法,可能使用矢量化。
What is the most efficient way to do this?最有效的方法是什么?
IIUC use merge_asof
: IIUC 使用merge_asof
:
print (df1)
a b c
0 A 2022-01-01 100
1 A 2022-02-01 150
2 B 2022-01-01 90
3 B 2022-01-15 100
print (df2)
a b
0 A 2022-01-01
1 A 2022-01-02
2 A 2022-01-05
3 B 2022-01-01
df = pd.merge_asof(df2.sort_values('b'), df1.sort_values('b'), on='b', by='a')
print (df)
a b c
0 A 2022-01-01 100
1 B 2022-01-01 90
2 A 2022-01-02 100
3 A 2022-01-05 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.