简体   繁体   English

熊猫矢量化根据日期分配列值,给定另一个具有值和开始日期的数据框

[英]Pandas vectorization to assign column value based on date, given another dataframe with value and start date

In Pandas, I have a dataframe df1 with stocks investments and a start date:在 Pandas 中,我有一个包含股票投资和开始日期的数据df1

Stock,StartDate,Investment
A,2022-01-01,100
A,2022-02-01,150
B,2022-01-01,90
B,2022-01-15,100
...

Then I have a df2 :然后我有一个df2

Stock,Date
A,2022-01-01
A,2022-01-02
A,2022-01-05
...
B,2022-01-01
...

I want to add a column Investment to df2 filled with investment taken from df1 : given a date d and a stock S in df2 , I want to assign the investment in df1 such date d >= StartDate and d < next start date .我想在df2中添加一个列Investment ,其中填充了从df1获取的投资:给定日期ddf2中的股票S ,我想将df1中的投资分配给这样的日期d >= StartDated < next start date

Expected output ( df2 ) in this case is:在这种情况下,预期输出( df2 )是:

Stock,Date,Investment
A,2022-01-01,100
A,2022-01-02,100
A,2022-01-05,100
...
A,2022-01-31,100
A,2022-02-01,150
A,2022-02-02,150
...
B,2022-01-01,90
B,2022-01-02,90
...
B,2022-01-14,90
B,2022-01-15,100
B,2022-01-16,100
...

This can clearly be done with a loop, but I was looking for a more efficient approach, possibly using vectorization.这显然可以通过循环来完成,但我一直在寻找一种更有效的方法,可能使用矢量化。

What is the most efficient way to do this?最有效的方法是什么?

IIUC use merge_asof : IIUC 使用merge_asof

print (df1)
    a          b    c
0  A 2022-01-01  100
1  A 2022-02-01  150
2  B 2022-01-01   90
3  B 2022-01-15  100

    
print (df2)
    a          b
0  A 2022-01-01
1  A 2022-01-02
2  A 2022-01-05
3  B 2022-01-01

df = pd.merge_asof(df2.sort_values('b'), df1.sort_values('b'), on='b', by='a')
print (df)
   a          b    c
0  A 2022-01-01  100
1  B 2022-01-01   90
2  A 2022-01-02  100
3  A 2022-01-05  100

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe:对于给定的行,尝试基于在另一列中查找值来分配特定列中的值 - Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in another column 根据列的日期和值重新索引Pandas数据框 - Reindex Pandas dataframe based on date and value of a column 根据另一个数据框中的数据将值分配给pandas列 - assign value to pandas column based on data in another dataframe 如何用另一个给定的日期范围替换 Pandas Dataframe 索引中的值? - How to replace a value in a Pandas Dataframe Index with another given a date range? 根据另一个 pandas 中的开始日期和结束日期列的条件创建新的 pandas dataframe - Create new pandas dataframe based on a condition on Start Date and End Date Column in another pandas 如何在 pandas dataframe 中添加基于日期条件的值的列? - How to add a column with value based on date condition in pandas dataframe? 根据等于 pandas dataframe 中的特定值的列定位最小日期? - Locating minimum date based on column equal to specific value in pandas dataframe? 熊猫按日期分组,将值分配给列 - pandas group by date, assign value to a column 创建日期列并根据 pandas 中现有日期列的条件分配值 - Create a date column and assign value from a condition based on an existing date column in pandas 如何根据另一列的值查找具有开始日期和结束日期的时间序列 pandas 数据帧中的链? - How to find chains in time series pandas data frames with start date and end date based on another column's value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM