Python Pandas，添加包含未來列值大於當前行值的第一個索引的列

Question

有沒有辦法使用矢量化方法添加一個列，該列指示滿足某些條件的下一個索引（例如，未來行的val大於當前行的val的第一個索引）？

我找到了許多示例，這些示例說明如何使用固定值執行此操作，例如獲取列大於0的下一個索引，但我想根據該行的更改值對每一行執行此操作。

這是一個使用簡單循環執行此操作的示例，我很好奇是否有 Pandas/vectorized 方法可以執行相同的操作：

import pandas as pd

df = pd.DataFrame( [0,2,3,2,3,4,5,6,5,4,7,8,7,2,3], columns=['val'], index=pd.date_range('20220101', periods=15))

def add_new_highs (df):

    df['new_high'] = pd.NaT
    for i,v in df.val.iteritems():
        row = df.loc[i:][ df.val > v ].head(1)
        if len(row) > 0:
            df['new_high'].loc[i] = row.index[0]

add_new_highs(df)
print(df)

Output：

            val   new_high
2022-01-01    0 2022-01-02
2022-01-02    2 2022-01-03
2022-01-03    3 2022-01-06
2022-01-04    2 2022-01-05
2022-01-05    3 2022-01-06
2022-01-06    4 2022-01-07
2022-01-07    5 2022-01-08
2022-01-08    6 2022-01-11
2022-01-09    5 2022-01-11
2022-01-10    4 2022-01-11
2022-01-11    7 2022-01-12
2022-01-12    8        NaT
2022-01-13    7        NaT
2022-01-14    2 2022-01-15
2022-01-15    3        NaT

Answer 1

一種選擇是使用 numpy 廣播。 由於我們要的是當前索引之后出現的索引，所以只需要看一個數組的上三角； 所以我們使用np.triu 。 然后因為我們需要第一個這樣的索引，所以我們使用argmax 。 最后，對於某些索引，可能永遠不會有大於值，因此我們使用where將它們替換為 NaN ：

import numpy as np
df['new_high'] = df.index[np.triu(df[['val']].to_numpy() < df['val'].to_numpy()).argmax(axis=1)]
df['new_high'] = df['new_high'].where(lambda x: x.index < x)

Output：

            val   new_high
2022-01-01    0 2022-01-02
2022-01-02    2 2022-01-03
2022-01-03    3 2022-01-06
2022-01-04    2 2022-01-05
2022-01-05    3 2022-01-06
2022-01-06    4 2022-01-07
2022-01-07    5 2022-01-08
2022-01-08    6 2022-01-11
2022-01-09    5 2022-01-11
2022-01-10    4 2022-01-11
2022-01-11    7 2022-01-12
2022-01-12    8        NaT
2022-01-13    7        NaT
2022-01-14    2 2022-01-15
2022-01-15    3        NaT

Answer 2

類似於@enke 的回復

import numpy as np
arr = np.repeat(df.values, len(df), axis=1)  # make a matrix
arr = np.tril(arr)  # remove values before you
arr = (arr - df.values.T) > 0  # make bool array of larger values
ind = np.argmax(arr, axis=0)  # get first larger value index

df['new_high'] = df.iloc[ind].index  # use index as new row
df['new_high'] = df['new_high'].replace({df.index[0]: pd.NaT})  # replace ones with no-max as NaT

Python Pandas，添加包含未來列值大於當前行值的第一個索引的列

問題描述

2 個解決方案

解決方案1
1 已采納

解決方案2
1 2022-04-28 17:45:40

Python Pandas，添加包含未來列值大於當前行值的第一個索引的列

問題描述

2 個解決方案

解決方案1 1 已采納

解決方案2 1 2022-04-28 17:45:40

解決方案1
1 已采納

解決方案2
1 2022-04-28 17:45:40