根據pandas數據框中的多個條件將值分配給列

Question

我試圖選擇使用多個日期，並根據基於這兩個日期的價格的最大值將值分配給列。 如果有人指出這是最快的方法，可能會有所幫助。

我已經嘗試過此代碼，但是它創建了一個新行，並且不會更改現有行。

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn

    dfIn.loc[dfIn.loc[mask].price.max(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask].price.min(), 'lowest'] = 1
    return dfIn

date       price  highest  lowest
2000-05-01 04:00:00    4.439730             0            0
2000-05-02 04:00:00    4.209830             0            0
2000-05-03 04:00:00    4.109380             0            0
2000-05-04 04:00:00    3.953130             0            0
2000-05-05 04:00:00    4.040180             0            0
2000-05-08 04:00:00    3.933040             0            0
2000-05-09 04:00:00    3.765630             0            0
2000-05-10 04:00:00    3.546880             0            0
2000-05-11 04:00:00    3.671880             0            0
2000-05-12 04:00:00    3.843750             0            0
2000-05-15 04:00:00    3.607150             0            0
2000-05-16 04:00:00    3.774560             0            0
2000-05-17 04:00:00    3.620540             0            0
2000-05-18 04:00:00    3.598220             0            0
2000-05-19 04:00:00    3.357150             0            0
2000-05-22 04:00:00    3.212060             0            0
2000-05-23 04:00:00    3.064740             0            0
2000-05-24 04:00:00    3.131700             0            0
2000-05-25 04:00:00    3.116630             0            0
2000-05-26 04:00:00    3.084830             0            0
2000-05-30 04:00:00    3.127230             0            0
2000-05-31 04:00:00    3.000000             0            0
2000-06-01 04:00:00    3.183040             0            0
2000-06-02 04:00:00    3.305810             0            0
.....
2000-06-30 04:00:00    3.261160             0            0

期望的結果應該是行應如下更新：

df = updateRecord(df, '2000-05-01 04:00:00', '2000-05-31 04:00:00')

df output should be:

2000-05-01 04:00:00    4.439730             1            0
2000-05-31 04:00:00    3.000000             0            1

我當前的代碼創建一個新行，而不是更新現有行。

Answer 1

我確信這不是最好的方法。

def updateRecord(dfIn, starDate, endDate):
    df_o = dfIn.loc[(dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)]
    if len(df_o) == 0:
        return dfIn
    # What is supposed to happen if len(df_o) > 0?
    idx = df_o['price'].argmax()
    df_o.at[idx,'highest'] = 1

    idx_l = df_o['price'].argmin()
    df_o.at[idx_l,'lowest'] = 1

    return df_o

希望它能工作。

Answer 2

這可行，但是會帶來所選的DataFrame。 如果您想要同樣的東西但帶了整個DataFrame，我也可以這樣做。

def updateRecord(dfIn, startDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    new_df = dfIn.loc[mask]
    if len(new_df) == 0:
        return dfIn
    new_df['highest']=np.where(new_df.price==new_df.price.max(),1,0)
    new_df['lowest']=np.where(new_df.price==new_df.price.min(),1,0)
    return new_df

Answer 3

我想您正在尋找這個。

def updateRecord(dfIn, starDate, endDate):
    mask = (dfIn['date'] <= endDate) & (dfIn['date'] >= startDate)
    if sum(mask) == 0:
        return dfIn

    # You want the argmax[min] for the given mask, not the entire DF, as you stated.
    dfIn.loc[dfIn.loc[mask, 'price'].argmax(), 'highest'] = 1
    dfIn.loc[dfIn.loc[mask, 'price'].argmin(), 'lowest'] = 1

    return dfIn

根據pandas數據框中的多個條件將值分配給列

問題描述

3 個解決方案

解決方案1
1 2019-05-14 18:35:06

解決方案2
0 2019-05-14 18:51:30

解決方案3
0 2019-05-14 18:58:04

根據pandas數據框中的多個條件將值分配給列

問題描述

3 個解決方案

解決方案1 1 2019-05-14 18:35:06

解決方案2 0 2019-05-14 18:51:30

解決方案3 0 2019-05-14 18:58:04

解決方案1
1 2019-05-14 18:35:06

解決方案2
0 2019-05-14 18:51:30

解決方案3
0 2019-05-14 18:58:04