I have a function func
that I want to apply to consecutive rows of a pandas dataframe. However, I get a ValueError:
when I try to do it as below.
import numpy as np
import pandas as pd
def func(a: np.ndarray, b: np.ndarray) -> float:
"""Return the sum of sum of vectors a and b"""
return np.sum(a) + np.sum(b)
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [10, 11, 12, 13, 14]})
df.rolling(window=2, axis=1).apply(func)
>>>
ValueError: Length of passed values is 2, index implies 5.
All I want to do is apply func
on a rolling basis to consecutive rows (which is why I chose window=2
above). The snippet below is a manual implementation of this.
func(df.iloc[0, :].values, df.iloc[1, :].values)
>>> 24
func(df.iloc[1, :].values, df.iloc[2, :].values)
>>> 28
and so on.
Note that the example I gave for func
is just for illustrative purposes - I know that that you could use a simple df.sum(axis=1) + df.shift(-1).sum(axis=1)
in this case. What I want to know is how you use rolling apply for custom functions in the general case.
I guess this can be done with a few lines of code and an intermediate dataframe:
import numpy as np
import pandas as pd
def func(a: np.ndarray) -> float:
return np.sum(a)
df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [10, 11, 12, 13, 14]})
df_rolled = df.rolling(window=2).apply(func)
df["ab_rolled"] = [func([df_rolled["a"][i], df_rolled["b"][i]])
for i in range(0, len(df_rolled["a"]))]
print(df)
outputs:
a b ab_rolled
0 1 10 NaN
1 2 11 24.0
2 3 12 28.0
3 4 13 32.0
4 5 14 36.0
This well could be an ugly code though. Sorry, it's the first time I use pandas.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.