[英]Apply custom function to pandas dataframe which relies on variable number of previous rows
I want an efficient solution to add a column to my table which calculates the sum of the absolute differences between this row's value and the values in the last N rows.我想要一个有效的解决方案来向我的表中添加一列,该列计算该行的值与最后 N 行中的值之间的绝对差之和。 eg
例如
number![]() |
new_col_2 ![]() |
new_col_3 ![]() |
new_col_4 ![]() |
---|---|---|---|
10 ![]() |
- ![]() |
- ![]() |
- ![]() |
11 ![]() |
- ![]() |
- ![]() |
- ![]() |
12 ![]() |
3 ![]() |
- ![]() |
- ![]() |
9 ![]() |
5 ![]() |
6 ![]() |
- ![]() |
8 ![]() |
5 ![]() |
8 ![]() |
10 ![]() |
12 ![]() |
7 ![]() |
7 ![]() |
8 ![]() |
new_col_2 => refers to calculating this for the last 2 rows.
(12-10) + (12-11) => 3
(11-9) + (12-9) => 5
new_col_3 => refers to calculate this for the last 3 rows
(10-9) + (11-9) + (12-9) => 6
(11-8) + (12-8) + (9-8) => 8
and so on.等等。
If N was a fixed number, I understand I could do this easily using:如果 N 是一个固定数字,我知道我可以使用以下方法轻松做到这一点:
df[new_col_N] = abs(df[number]-df[number].shift(N)) + abs(df[number]-df[number].shift(N-1)) + etc
But this assumes a fixed N. I want to write a function where I can add this column with N as an integer variable that can change.但这假设 N 是固定的。我想写一个 function ,我可以在其中添加这个带有 N 的列作为可以更改的 integer 变量。
Any idea what the most efficient way to do this is?知道最有效的方法是什么吗?
Edit: Answer accepted below leads to the following solution for me:编辑:下面接受的答案会为我带来以下解决方案:
df[new_col_name] = df['number'].rolling(window=period+1).apply(lambda x: np.sum(np.abs(x[:-1]-x[-1])))
We can do numpy broad cast我们可以做numpy广播
n = 2
a = df.number.values
df.loc[n:,'new'] = np.sum(np.abs(np.tril(np.triu(a-a[:,None],k=-n))),1)[n:]
df
Out[188]:
number new_col_2 new_col_3 new_col_4 new1
0 10 - - - NaN
1 11 - - - NaN
2 12 3 - - 3.0
3 9 5 6 - 5.0
4 8 5 8 10 5.0
5 12 7 7 8 7.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.