熊猫矢量化根据日期分配列值，给定另一个具有值和开始日期的数据框

Question

In Pandas, I have a dataframe df1 with stocks investments and a start date:在 Pandas 中，我有一个包含股票投资和开始日期的数据df1 ：

Stock,StartDate,Investment
A,2022-01-01,100
A,2022-02-01,150
B,2022-01-01,90
B,2022-01-15,100
...

Then I have a df2 :然后我有一个df2 ：

Stock,Date
A,2022-01-01
A,2022-01-02
A,2022-01-05
...
B,2022-01-01
...

I want to add a column Investment to df2 filled with investment taken from df1 : given a date d and a stock S in df2 , I want to assign the investment in df1 such date d >= StartDate and d < next start date .我想在df2中添加一个列Investment ，其中填充了从df1获取的投资：给定日期d和df2中的股票S ，我想将df1中的投资分配给这样的日期d >= StartDate和d < next start date 。

Expected output ( df2 ) in this case is:在这种情况下，预期输出（ df2 ）是：

Stock,Date,Investment
A,2022-01-01,100
A,2022-01-02,100
A,2022-01-05,100
...
A,2022-01-31,100
A,2022-02-01,150
A,2022-02-02,150
...
B,2022-01-01,90
B,2022-01-02,90
...
B,2022-01-14,90
B,2022-01-15,100
B,2022-01-16,100
...

This can clearly be done with a loop, but I was looking for a more efficient approach, possibly using vectorization.这显然可以通过循环来完成，但我一直在寻找一种更有效的方法，可能使用矢量化。

What is the most efficient way to do this?最有效的方法是什么？

Answer 1

IIUC use merge_asof : IIUC 使用merge_asof ：

print (df1)
    a          b    c
0  A 2022-01-01  100
1  A 2022-02-01  150
2  B 2022-01-01   90
3  B 2022-01-15  100

    
print (df2)
    a          b
0  A 2022-01-01
1  A 2022-01-02
2  A 2022-01-05
3  B 2022-01-01

df = pd.merge_asof(df2.sort_values('b'), df1.sort_values('b'), on='b', by='a')
print (df)
   a          b    c
0  A 2022-01-01  100
1  B 2022-01-01   90
2  A 2022-01-02  100
3  A 2022-01-05  100

熊猫矢量化根据日期分配列值，给定另一个具有值和开始日期的数据框

问题描述

1 个解决方案

解决方案1
1 2022-07-19 09:10:44

熊猫矢量化根据日期分配列值，给定另一个具有值和开始日期的数据框

问题描述

1 个解决方案

解决方案1 1 2022-07-19 09:10:44

解决方案1
1 2022-07-19 09:10:44