[英]Filtering DataFrame in pandas based on criteria from another DataFrame
[英]Python Pandas Dataframe: based on DateTime criteria, I would like to populate a dataframe with data from another dataframe
我創建了一個簡單的數據框“ F_test”。 現在,我要根據“ P”中的單元格是否與“ F_test”位於同一行,並且位於該行的開始日期/結束日期之間,使用來自“ F_test”的數據填充另一個數據框“ P”。
但是,當我執行一個簡單的For循環來執行此操作時,在第一行之后,“ P”矩陣中沒有其他數據被更新。
實際上,在PC上的代碼中,我實際上是從Excel文件中提取“ F_test”數據,但是為了在此論壇上提供完整的數據集,我手動創建了一個簡單的數據框,名為“ F_test”。
正如您可能從代碼中看出的那樣,我是Matlab / VBA Excel世界中的一位最近的轉換...
非常感謝您在此主題上的智慧。
F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns= ('startdate','enddate','price'))
#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])
#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])
#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1
#re-order the F matrix by column 'duration', ensure that the bootstrapping
#prioritises the shorter term contracts
F.sort_values(by=['duration'], ascending=[True])
#create D matrix, dataframe containing each day from start to end date
tempDateRange = pd.date_range(start=F['startdate'].min(), end=F['enddate'].max(), freq='D')
D = pd.DataFrame(tempDateRange)
#define Nb of Calendar Days in a variable to be used later
intNbCalendarDays = (F['enddate'].max() - F['startdate'].min()).days + 1
#define Nb of Contracts in a variable to be used later
intNbContracts = len(F)
#define a zero filled matrix, P, which will house the contract prices
P = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))
#rename columns of P to be the dates contained in matrix array D
P.columns = tempDateRange
#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
for j in list(range(0, intNbCalendarDays)):
if ((F.iloc[i,0] >= P.columns[j]) & (F.iloc[i,1] <= P.columns[j] )):
P.iloc[i,j] = F.iloc[i,2]
P
我認為最后的日期比較是錯誤的方式,您應該使用“ and”而不是“&”(這是按位運算符)。 嘗試這個:
# create prices in correct rows in P
for i in list(range(0, intNbContracts)):
for j in list(range(0, intNbCalendarDays)):
if (F.iloc[i, 0] <= P.columns[j]) and (F.iloc[i, 1] >= P.columns[j]):
P.iloc[i, j] = F.iloc[i, 2]
這可能仍未達到您所能達到的效率,但我認為更好。 此后將替換為“ #create D矩陣,包含...的數據框”
# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
new_P_row = pd.Series()
for date in pd.date_range(row['startdate'], row['enddate']):
new_P_row[date] = row['price']
P = P.append(new_P_row, ignore_index=True)
P.fillna(0, inplace=True)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.