I have a pandas dataframe where I would like to apply a simple sign and multiply operation to each row and the row two indices back (shifted by 2). For example if we had
row_a = np.array([0.45, -0.78, 0.92])
row_b = np.array([1.2, -0.73, -0.46])
sgn_row_a = np.sign(row_a)
sgn_row_b = np.sign(row_b)
result = sgn_row_a * sgn_row_b
result
>>> array([1., 1., -1.])
What I have tried
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.normal(0, 1, (100, 5)), columns=["a", "b", "c", "d", "e"])
def kernel(row_a, row_b):
"""Take the sign of both rows and multiply them"""
sgn_a = np.sign(row_a)
sgn_b = np.sign(row_b)
return sgn_a * sgn_b
def func(data):
"""Apply 'kernel' to the dataframe row-wise, axis=1"""
out = data.apply(lambda x: kernel(x, x.shift(2)), axis=1)
return out
But then when I run the function I get the below as output which is incorrect. It seems to shift the columns rather than the rows. But when I tried different axis
in the shift operation, I just got errors ( ValueError: No axis named 1 for object type Series
)
out = func(df)
out
>>>
a b c d e
0 NaN NaN 1.0 -1.0 -1.0
1 NaN NaN -1.0 -1.0 1.0
2 NaN NaN -1.0 1.0 -1.0
3 NaN NaN -1.0 1.0 -1.0
4 NaN NaN 1.0 1.0 -1.0
.. .. .. ... ... ...
What I expect is
out = func(df)
out
>>>
a b c d e
0 -1. 1. 1. -1. 1.
1 1. -1. 1. 1. -1.
2 -1. 1. 1. 1. 1.
3 -1. 1. 1. 1. 1.
4 -1. -1. -1. 1. -1.
.. .. .. ... ... ...
How can I achieve a shifted row-wise operation as I have outlined above?
It seems the simplest way to do this prticular operation is
df.apply(np.sign) * df.shift(2).apply(np.sign)
>>>
a b c d e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 -1.0 1.0 1.0 -1.0 1.0
3 1.0 -1.0 1.0 1.0 -1.0
4 -1.0 1.0 1.0 1.0 1.0
.. ... ... ... ... ...
And just apply a negative sign to the shift to shift the other way.
apply
is for loop by columns, here is possible pass DataFrame
to np.sign
function:
df = np.sign(df) * np.sign(df.shift(2))
print (df)
a b c d e
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 -1.0 1.0 1.0 -1.0 1.0
3 1.0 -1.0 1.0 1.0 -1.0
4 -1.0 1.0 1.0 1.0 1.0
.. ... ... ... ... ...
95 1.0 1.0 1.0 -1.0 -1.0
96 1.0 1.0 1.0 1.0 -1.0
97 1.0 -1.0 -1.0 1.0 1.0
98 1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0 1.0 1.0 -1.0 -1.0
[100 rows x 5 columns]
then if need remove first NaN
s rows:
#df = df.dropna()
df = df.iloc[2:]
print (df)
a b c d e
2 -1.0 1.0 1.0 -1.0 1.0
3 1.0 -1.0 1.0 1.0 -1.0
4 -1.0 1.0 1.0 1.0 1.0
5 -1.0 1.0 1.0 1.0 1.0
6 -1.0 -1.0 -1.0 1.0 -1.0
.. ... ... ... ... ...
95 1.0 1.0 1.0 -1.0 -1.0
96 1.0 1.0 1.0 1.0 -1.0
97 1.0 -1.0 -1.0 1.0 1.0
98 1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0 1.0 1.0 -1.0 -1.0
[98 rows x 5 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.