如何根據前 N 行計算 Pandas dataframe 列的斜率

Question

我有以下示例 dataframe：

import pandas as pd

d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}

df = pd.DataFrame(data=d)
print(df)

Output：

我需要從col1計算前 N 行的斜率，並將斜率值保存在單獨的列中（稱為slope ）。 所需的 output 可能如下所示：（為了舉例，下面給出的斜率值只是隨機數。）

       col1  slope
0      2
1      5
2      6
3      5
4      4     3
5      6     4
6      7     5
7      8     2
8      9     4
9      7     6
10     5     5

因此，在索引號為 4 的行中，斜率為 3，它是 [2, 5, 6, 5, 4] 的斜率。

有沒有不使用 for 循環的優雅方法？

附錄：

根據下面接受的答案，如果您收到以下錯誤：

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

您的 dataframe 的索引可能不是數字。 以下修改使其可以工作：

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(range(5), s.values)[0])

Answer 1

您可以使用rolling + apply和scipy.stats.linregress ：

from scipy.stats import linregress

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])

print(df)

output：

    col1  slope
0      2    NaN
1      5    NaN
2      6    NaN
3      5    NaN
4      4    0.4
5      6    0.0
6      7    0.3
7      8    0.9
8      9    1.2
9      7    0.4
10     5   -0.5

Answer 2

讓我們用numpy

def slope_numpy(x,y):
    fit = np.polyfit(x, y, 1)
    return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0     NaN
1     NaN
2     NaN
3     NaN
4     3.6
5     5.2
6     5.0
7     4.2
8     4.4
9     6.6
10    8.2
Name: col1, dtype: float64

如何根據前 N 行計算 Pandas dataframe 列的斜率

問題描述

2 個解決方案

解決方案1
5 已采納 2022-01-01 14:32:23

解決方案2
3 2022-01-01 14:53:19

如何根據前 N 行計算 Pandas dataframe 列的斜率

問題描述

2 個解決方案

解決方案1 5 已采納 2022-01-01 14:32:23

解決方案2 3 2022-01-01 14:53:19

解決方案1
5 已采納 2022-01-01 14:32:23

解決方案2
3 2022-01-01 14:53:19