簡體   English   中英

根據pandas數據框中的多個條件將值分配給列

[英]Assign value to column based on multiple condition in pandas dataframe

我試圖選擇使用多個日期,並根據基於這兩個日期的價格的最大值將值分配給列。 如果有人指出這是最快的方法,可能會有所幫助。

我已經嘗試過此代碼,但是它創建了一個新行,並且不會更改現有行。

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn

    dfIn.loc[dfIn.loc[mask].price.max(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask].price.min(), 'lowest'] = 1
    return dfIn
date       price  highest  lowest
2000-05-01 04:00:00    4.439730             0            0
2000-05-02 04:00:00    4.209830             0            0
2000-05-03 04:00:00    4.109380             0            0
2000-05-04 04:00:00    3.953130             0            0
2000-05-05 04:00:00    4.040180             0            0
2000-05-08 04:00:00    3.933040             0            0
2000-05-09 04:00:00    3.765630             0            0
2000-05-10 04:00:00    3.546880             0            0
2000-05-11 04:00:00    3.671880             0            0
2000-05-12 04:00:00    3.843750             0            0
2000-05-15 04:00:00    3.607150             0            0
2000-05-16 04:00:00    3.774560             0            0
2000-05-17 04:00:00    3.620540             0            0
2000-05-18 04:00:00    3.598220             0            0
2000-05-19 04:00:00    3.357150             0            0
2000-05-22 04:00:00    3.212060             0            0
2000-05-23 04:00:00    3.064740             0            0
2000-05-24 04:00:00    3.131700             0            0
2000-05-25 04:00:00    3.116630             0            0
2000-05-26 04:00:00    3.084830             0            0
2000-05-30 04:00:00    3.127230             0            0
2000-05-31 04:00:00    3.000000             0            0
2000-06-01 04:00:00    3.183040             0            0
2000-06-02 04:00:00    3.305810             0            0
.....
2000-06-30 04:00:00    3.261160             0            0

期望的結果應該是行應如下更新:

df = updateRecord(df, '2000-05-01 04:00:00', '2000-05-31 04:00:00')

df output should be:

2000-05-01 04:00:00    4.439730             1            0
2000-05-31 04:00:00    3.000000             0            1

我當前的代碼創建一個新行,而不是更新現有行。

我確信這不是最好的方法。

def updateRecord(dfIn, starDate, endDate):
    df_o = dfIn.loc[(dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)]
    if len(df_o) == 0:
        return dfIn
    # What is supposed to happen if len(df_o) > 0?
    idx = df_o['price'].argmax()
    df_o.at[idx,'highest'] = 1

    idx_l = df_o['price'].argmin()
    df_o.at[idx_l,'lowest'] = 1

    return df_o

希望它能工作。

這可行,但是會帶來所選的DataFrame。 如果您想要同樣的東西但帶了整個DataFrame,我也可以這樣做。

def updateRecord(dfIn, startDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn
    new_df['highest']=np.where(new_df.price==new_df.price.max(),1,0)
    new_df['lowest']=np.where(new_df.price==new_df.price.min(),1,0)
    return new_df

我想您正在尋找這個。

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    if sum(mask) == 0:
        return dfIn

    # You want the argmax[min] for the given mask, not the entire DF, as you stated.
    dfIn.loc[dfIn.loc[mask, 'price'].argmax(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask, 'price'].argmin(), 'lowest'] = 1

    return dfIn

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM