繁体   English   中英

如何将逐行 function 应用于 pandas dataframe 和自身的移位版本

[英]How to apply a row-wise function to a pandas dataframe and a shifted version of itself

我有一个 pandas dataframe ,我想对每一行应用一个简单的符号和乘法运算,并将行的两个索引向后移动(移动 2)。 例如,如果我们有

row_a = np.array([0.45, -0.78, 0.92])
row_b = np.array([1.2, -0.73, -0.46])
sgn_row_a = np.sign(row_a)
sgn_row_b = np.sign(row_b)
result = sgn_row_a * sgn_row_b
result
>>> array([1., 1., -1.])

我试过的

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame(np.random.normal(0, 1, (100, 5)), columns=["a", "b", "c", "d", "e"])

def kernel(row_a, row_b):
    """Take the sign of both rows and multiply them"""
    sgn_a = np.sign(row_a)
    sgn_b = np.sign(row_b)
    return sgn_a * sgn_b

def func(data):
    """Apply 'kernel' to the dataframe row-wise, axis=1"""
    out = data.apply(lambda x: kernel(x, x.shift(2)), axis=1)
    return out

但是当我运行 function 时,我得到以下 output 这是不正确的。 它似乎移动了列而不是行。 但是当我在移位操作中尝试不同的axis时,我得到了错误( ValueError: No axis named 1 for object type Series

out = func(df)
out
>>>
      a   b    c    d    e
0   NaN NaN  1.0 -1.0 -1.0
1   NaN NaN -1.0 -1.0  1.0
2   NaN NaN -1.0  1.0 -1.0
3   NaN NaN -1.0  1.0 -1.0
4   NaN NaN  1.0  1.0 -1.0
..   ..  ..  ...  ...  ...

我期望的是

out = func(df)
out
>>>
      a   b    c    d    e
0    -1.  1.   1.  -1.   1.
1     1. -1.   1.   1.  -1.
2    -1.  1.   1.   1.   1.
3    -1.  1.   1.   1.   1.
4    -1. -1.  -1.   1.  -1.
..   ..  ..  ...  ...  ...

如上所述,如何实现移位的逐行操作?

似乎执行此特定操作的最简单方法是

df.apply(np.sign) * df.shift(2).apply(np.sign)
>>>
       a    b    c    d    e
0    NaN  NaN  NaN  NaN  NaN
1    NaN  NaN  NaN  NaN  NaN
2   -1.0  1.0  1.0 -1.0  1.0
3    1.0 -1.0  1.0  1.0 -1.0
4   -1.0  1.0  1.0  1.0  1.0
..   ...  ...  ...  ...  ...

只需将负号应用于移位即可以另一种方式移位。

apply是按列循环,这里可以将DataFrame传递给np.sign function:

df = np.sign(df) * np.sign(df.shift(2))
print (df)
      a    b    c    d    e
0   NaN  NaN  NaN  NaN  NaN
1   NaN  NaN  NaN  NaN  NaN
2  -1.0  1.0  1.0 -1.0  1.0
3   1.0 -1.0  1.0  1.0 -1.0
4  -1.0  1.0  1.0  1.0  1.0
..  ...  ...  ...  ...  ...
95  1.0  1.0  1.0 -1.0 -1.0
96  1.0  1.0  1.0  1.0 -1.0
97  1.0 -1.0 -1.0  1.0  1.0
98  1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0  1.0  1.0 -1.0 -1.0

[100 rows x 5 columns]

然后如果需要删除第一个NaN s 行:

#df = df.dropna()
df = df.iloc[2:]
print (df)
      a    b    c    d    e
2  -1.0  1.0  1.0 -1.0  1.0
3   1.0 -1.0  1.0  1.0 -1.0
4  -1.0  1.0  1.0  1.0  1.0
5  -1.0  1.0  1.0  1.0  1.0
6  -1.0 -1.0 -1.0  1.0 -1.0
..  ...  ...  ...  ...  ...
95  1.0  1.0  1.0 -1.0 -1.0
96  1.0  1.0  1.0  1.0 -1.0
97  1.0 -1.0 -1.0  1.0  1.0
98  1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0  1.0  1.0 -1.0 -1.0

[98 rows x 5 columns]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM