![](/img/trans.png)
[英]How to assign a value to a column for a subset of dataframe based on a condition in Pandas?
[英]Assign value to column based on multiple condition in pandas dataframe
我試圖選擇使用多個日期,並根據基於這兩個日期的價格的最大值將值分配給列。 如果有人指出這是最快的方法,可能會有所幫助。
我已經嘗試過此代碼,但是它創建了一個新行,並且不會更改現有行。
def updateRecord(dfIn, starDate, endDate):
mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
new_df = dfIn.loc[mask]
if len(new_df) == 0:
return dfIn
dfIn.loc[dfIn.loc[mask].price.max(), 'highest'] = 1
dfIn.loc[dfIn.loc[mask].price.min(), 'lowest'] = 1
return dfIn
date price highest lowest
2000-05-01 04:00:00 4.439730 0 0
2000-05-02 04:00:00 4.209830 0 0
2000-05-03 04:00:00 4.109380 0 0
2000-05-04 04:00:00 3.953130 0 0
2000-05-05 04:00:00 4.040180 0 0
2000-05-08 04:00:00 3.933040 0 0
2000-05-09 04:00:00 3.765630 0 0
2000-05-10 04:00:00 3.546880 0 0
2000-05-11 04:00:00 3.671880 0 0
2000-05-12 04:00:00 3.843750 0 0
2000-05-15 04:00:00 3.607150 0 0
2000-05-16 04:00:00 3.774560 0 0
2000-05-17 04:00:00 3.620540 0 0
2000-05-18 04:00:00 3.598220 0 0
2000-05-19 04:00:00 3.357150 0 0
2000-05-22 04:00:00 3.212060 0 0
2000-05-23 04:00:00 3.064740 0 0
2000-05-24 04:00:00 3.131700 0 0
2000-05-25 04:00:00 3.116630 0 0
2000-05-26 04:00:00 3.084830 0 0
2000-05-30 04:00:00 3.127230 0 0
2000-05-31 04:00:00 3.000000 0 0
2000-06-01 04:00:00 3.183040 0 0
2000-06-02 04:00:00 3.305810 0 0
.....
2000-06-30 04:00:00 3.261160 0 0
期望的結果應該是行應如下更新:
df = updateRecord(df, '2000-05-01 04:00:00', '2000-05-31 04:00:00')
df output should be:
2000-05-01 04:00:00 4.439730 1 0
2000-05-31 04:00:00 3.000000 0 1
我當前的代碼創建一個新行,而不是更新現有行。
我確信這不是最好的方法。
def updateRecord(dfIn, starDate, endDate):
df_o = dfIn.loc[(dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)]
if len(df_o) == 0:
return dfIn
# What is supposed to happen if len(df_o) > 0?
idx = df_o['price'].argmax()
df_o.at[idx,'highest'] = 1
idx_l = df_o['price'].argmin()
df_o.at[idx_l,'lowest'] = 1
return df_o
希望它能工作。
這可行,但是會帶來所選的DataFrame。 如果您想要同樣的東西但帶了整個DataFrame,我也可以這樣做。
def updateRecord(dfIn, startDate, endDate):
mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
new_df = dfIn.loc[mask]
if len(new_df) == 0:
return dfIn
new_df['highest']=np.where(new_df.price==new_df.price.max(),1,0)
new_df['lowest']=np.where(new_df.price==new_df.price.min(),1,0)
return new_df
我想您正在尋找這個。
def updateRecord(dfIn, starDate, endDate):
mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
if sum(mask) == 0:
return dfIn
# You want the argmax[min] for the given mask, not the entire DF, as you stated.
dfIn.loc[dfIn.loc[mask, 'price'].argmax(), 'highest'] = 1
dfIn.loc[dfIn.loc[mask, 'price'].argmin(), 'lowest'] = 1
return dfIn
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.