熊貓根據下一行分配值

Question

考慮這個帶有“記錄”、“開始”和“參數”列的簡單 Pandas DataFrame。 可以有多行具有相同的記錄值，每個唯一的記錄值對應相同的起始值。 但是，對於相同的“記錄”和“開始”組合，“參數”值可能不同：

pd.DataFrame({'record':[1,2,3,4,4,5,6,7,7,7,8], 'start':[0,5,7,13,13,19,27,38,38,38,54], 'param':['t','t','t','u','v','t','t','t','u','v','t']})

我想創建一個列“結束”，該列在下一個唯一值“記錄”的行中取“開始”的值。 'end' 列的值應該是：

[5,7,13,19,19,27,38,54,54,54,NaN]

我可以使用 for 循環來做到這一點，但我知道這在使用 Pandas 時不是首選：

max_end = 100
for idx, row in df.iterrows():
    try:
        n = 1
        next_row = df.iloc[idx+n]
        while next_row['start'] == row['start']:
            n = n+1
            next_row = df.iloc[idx+n]
        end = next_row['start']
    except:
        end = max_end
    df.at[idx, 'end'] = end

有沒有一種簡單的方法可以在沒有 for 循環的情況下實現這一目標？

Answer 1

我毫不懷疑有一個更聰明的解決方案，但這是我的。

df1['end'] = df1.drop_duplicates(subset = ['record', 'start'])['start'].shift(-1).reindex(index = df1.index, method = 'ffill')

-=EDIT=- 將子集添加到drop_duplicates以說明問題修正

Answer 2

盡管更明確，但此解決方案等效於@ Quixotic22。

df = pd.DataFrame({
'record':[1,2,3,4,4,5,6,7,7,7,8],
'start':[0,5,7,13,13,19,27,38,38,38,54],
'param':['t','t','t','u','v','t','t','t','u','v','t']
})
max_end = 100

df["end"] = None  # create new column with empty values
loc = df["record"].shift(1) != df["record"] # record where the next value is diff from previous

df.loc[loc, "end"] = df.loc[loc, "start"].shift(-1)  # assign desired values
df["end"].fillna(method = "ffill", inplace = True)  # fill remaining missing values
df.loc[df.index[-1], "end"] = max_end  # override last value

df

熊貓根據下一行分配值

問題描述

2 個解決方案

解決方案1
0 2021-11-02 16:04:59

解決方案2
0 2021-11-02 18:47:05

熊貓根據下一行分配值

問題描述

2 個解決方案

解決方案1 0 2021-11-02 16:04:59

解決方案2 0 2021-11-02 18:47:05

解決方案1
0 2021-11-02 16:04:59

解決方案2
0 2021-11-02 18:47:05