简体   繁体   English

Python Pandas Dataframe:基于DateTime条件,我想用来自另一个数据框的数据填充一个数据框

[英]Python Pandas Dataframe: based on DateTime criteria, I would like to populate a dataframe with data from another dataframe

I have created a simple dataframe,"F_test". 我创建了一个简单的数据框“ F_test”。 I would now like to populate another dataframe,"P", with data from "F_test" based on whether the cell in "P" lies in the same row as "F_test" and is inbetween the startdates/enddates for that row. 现在,我要根据“ P”中的单元格是否与“ F_test”位于同一行,并且位于该行的开始日期/结束日期之间,使用来自“ F_test”的数据填充另一个数据框“ P”。

However, when I execute a simple For Loop to do this, after the first row, no other data is updated in the "P" matrix. 但是,当我执行一个简单的For循环来执行此操作时,在第一行之后,“ P”矩阵中没有其他数据被更新。

In the code on my PC, I actually extract "F_test" data from an Excel File, but for the purposes of giving a complete dataset on this forum, I have manually created the simple dataframe, named "F_test". 实际上,在PC上的代码中,我实际上是从Excel文件中提取“ F_test”数据,但是为了在此论坛上提供完整的数据集,我手动创建了一个简单的数据框,名为“ F_test”。

As you may be able to tell from the code, I am a recent convert from the Matlab/VBA Excel world... 正如您可能从代码中看出的那样,我是Matlab / VBA Excel世界中的一位最近的转换...

I would really appreciate your wisdom on this topic. 非常感谢您在此主题上的智慧。

F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns= ('startdate','enddate','price'))

#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])

#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])

#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1

#re-order the F matrix by column 'duration', ensure that the bootstrapping 
#prioritises the shorter term contracts 
F.sort_values(by=['duration'], ascending=[True])

#create D matrix, dataframe containing each day from start to end date
tempDateRange = pd.date_range(start=F['startdate'].min(), end=F['enddate'].max(), freq='D')
D = pd.DataFrame(tempDateRange)

#define Nb of Calendar Days in a variable to be used later
intNbCalendarDays = (F['enddate'].max() - F['startdate'].min()).days + 1

#define Nb of Contracts in a variable to be used later
intNbContracts = len(F)

#define a zero filled matrix, P, which will house the contract prices 
P = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))

#rename columns of P to be the dates contained in matrix array D
P.columns = tempDateRange 

#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if ((F.iloc[i,0] >= P.columns[j]) & (F.iloc[i,1] <= P.columns[j] )):
            P.iloc[i,j] = F.iloc[i,2]
P

I think your date comparisons are the wrong way round at the end and you should use 'and' not '&' (which is the bitwise operator). 我认为最后的日期比较是错误的方式,您应该使用“ and”而不是“&”(这是按位运算符)。 Try this: 尝试这个:

# create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if (F.iloc[i, 0] <= P.columns[j]) and (F.iloc[i, 1] >= P.columns[j]):
            P.iloc[i, j] = F.iloc[i, 2]

This is still probably not as efficient as you could get, but is better I think. 这可能仍未达到您所能达到的效率,但我认为更好。 This would replace from your "#create D matrix, dataframe containing..." comment onwards 此后将替换为“ #create D矩阵,包含...的数据框”

# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
    new_P_row = pd.Series()
    for date in pd.date_range(row['startdate'], row['enddate']):
        new_P_row[date] = row['price']
    P = P.append(new_P_row, ignore_index=True)

P.fillna(0, inplace=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据来自另一个 DataFrame 的标准过滤 Pandas 中的 DataFrame - Filtering DataFrame in pandas based on criteria from another DataFrame 基于来自另一个数据帧的条件填充熊猫数据帧的有效方法 - efficient way to populate pandas dataframe based on conditions from another dataframe 根据Python Pandas中另一个数据框中的数据选择一个数据框中的行 - Selecting rows in one dataframe based on data in another dataframe in Python Pandas 根据来自另一个数据框的数据将值分配给Pandas数据框中的列 - Assign values to columns in Pandas Dataframe based on data from another dataframe Pandas - 根据另一个填充一个数据框列 - Pandas - populate one dataframe column based on another 为什么不能根据多个或条件在 python pandas 数据框中选择数据 - Why not able to select data in python pandas dataframe based on multiple or criteria Pandas 根据另一个数据框中的匹配列填充新的数据框列 - Pandas populate new dataframe column based on matching columns in another dataframe 一种高效(快速)的方法,可以根据从Python Pandas中另一个DataFrame获取的范围将一个DataFrame中的连续数据分组? - Efficient (fast) way to group continuous data in one DataFrame based on ranges taken from another DataFrame in Python Pandas? 根据另一个 pandas Z6A8064B5DF479C550570 的值填充一个 pandas dataframe 的最快方法是什么? - What is the fastest way to populate one pandas dataframe based on values from another pandas dataframe? 根据存储在字典中的标准从 Pandas 数据框中选择数据 - Selecting data from Pandas dataframe based on criteria stored in a dict
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM