[英]Pandas rolling apply function to entire window dataframe
I want to apply a function to a rolling window.我想将函数应用于滚动窗口。 All the answers I saw here are focused on applying to a single row / column, but I would like to apply my function to the entire window.
我在这里看到的所有答案都集中在应用于单行/列,但我想将我的函数应用于整个窗口。 Here is a simplified example:
这是一个简化的示例:
import pandas as pd
data = [ [1,2], [3,4], [3,4], [6,6], [9,1], [11,2] ]
df = pd.DataFrame(columns=list('AB'), data=data)
This is df
:这是
df
:
A B
0 1 2
1 3 4
2 3 4
3 6 6
4 9 1
5 11 2
Take some function to apply to the entire window:取一些函数应用于整个窗口:
df.rolling(3).apply(lambda x: x.shape)
In this example, I would like to get something like:在这个例子中,我想得到类似的东西:
some_name
0 NA
1 NA
2 (3,2)
3 (3,2)
4 (3,2)
5 (3,2)
Of course, the shape is used as an example showing f
treats the entire window as the object of calculation, not just a row / column.当然,以形状为例说明
f
将整个窗口视为计算对象,而不仅仅是行/列。 I tried playing with the axis
keyword for rolling
, as well as with the raw
keyword for apply
but with no success.我尝试使用
axis
关键字来rolling
,以及使用raw
关键字来apply
,但没有成功。 Other methods ( agg, transform
) do not seem to deliver either.其他方法(
agg, transform
)似乎也没有提供。
Sure, I can do this with a list comprehension.当然,我可以通过列表理解来做到这一点。 Just thought there is an easier / cleaner way of doing this.
只是认为有一种更简单/更清洁的方法可以做到这一点。
Not with pd.DataFrame.rolling
.... that function is applied iteratively to the columns, taking in a series of floats/NaN, and returning a series of floats/NaN, one-by-one.不是
pd.DataFrame.rolling
.... 该函数迭代地应用于列,接受一系列浮点数/NaN,并一个接一个地返回一系列浮点数/NaN。 I think you'll have better luck with your intuition....我认为你的直觉会更好......
def rolling_pipe(dataframe, window, fctn):
return pd.Series([dataframe.iloc[i-window: i].pipe(fctn)
if i >= window else None
for i in range(1, len(dataframe)+1)],
index = dataframe.index)
df.pipe(rolling_pipe, 3, lambda x: x.shape)
The argument supplied to your apply function is a Series with an index property containing start, stop and step properties.提供给您的 apply 函数的参数是一个带有 index 属性的 Series,其中包含 start、stop 和 step 属性。
RangeIndex(start=0, stop=2, step=1)
You can use this to query your data frame.您可以使用它来查询您的数据框。
df = pd.DataFrame([('Sean', i) for i in range(1,11)], columns=['name', 'value'])
def func(series):
view = df.iloc[series.index]
# use view to do something...
count = len(view[view.value.isin([1,2,8])])
return count
df['count'] = df.value.rolling(2).apply(func)
There may be a more efficient way to do this but I'm not sure how.可能有更有效的方法可以做到这一点,但我不确定如何。
If you need rolling application over a datetime-like index, the other answers are not sufficient.如果您需要在类似日期时间的索引上滚动应用程序,那么其他答案是不够的。
You have to resort to manually iterating over the Rolling
object, and reconstructing the result into a Series
or DataFrame
as needed:您必须求助于手动迭代
Rolling
对象,并根据需要将结果重建为Series
或DataFrame
:
from datetime import (
datetime as DateTime,
timedelta as TimeDelta,
)
import pandas as pd
now = DateTime.now(tz=TimeZone.utc)
df = pd.DataFrame([
{'t': now + TimeDelta(days=1), 'x': 11, 'y': 21},
{'t': now + TimeDelta(days=2), 'x': 12, 'y': 22},
{'t': now + TimeDelta(days=3), 'x': 13, 'y': 23},
{'t': now + TimeDelta(days=4), 'x': 14, 'y': 24},
]).set_index('t')
results = {}
for group in df.rolling('2D'):
# Perform a silly calculation, in this case an aggregation
result = group['y'].min() * group['x'].max()
# Choose a value to use as the resulting index
index = group.index.min()
results[index] = result
results = pd.Series(results)
print(results)
2022-07-15 01:41:05.121823+00:00 252
2022-07-16 01:41:05.121823+00:00 286
2022-07-17 01:41:05.121823+00:00 322
dtype: int64
This works analogously to iterating over a GroupBy
object.这类似于迭代
GroupBy
对象。 Unfortunately however, and unlike with GroupBy
, iterating does not yield the actual bounds that are used for the rolling window.然而不幸的是,与
GroupBy
不同的是,迭代不会产生用于滚动窗口的实际边界。 I am not aware of a way to manually obtain these.我不知道手动获取这些的方法。
I expected that this should also be possible with the new method=
kwarg in DataFrame.rolling
, but I wasn't able to get it to work properly.我希望这也应该可以通过 DataFrame.rolling 中的 new
method=
DataFrame.rolling
,但我无法让它正常工作。 I will post a separate answer if I figure it out!如果我弄清楚了,我会发布一个单独的答案!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.