滚动时间窗口功能 — 使用 Pandas 进行数据整理

Question

09DEC2020 | 2020 年 12 月 9 日 | Mike |迈克 | Jim |吉姆 | Rome Open |罗马公开赛 | Clay |粘土 | 65% | 65% | 70% 70%

I'm trying to create new columns that are rolling time-window based per "PLAYER AND SURFACE", eg, LAST90DAYS_PLAYER1_CLAYSERVE% and LAST5MATCHES_PLAYER1_CLAYSERVE%.我正在尝试创建基于每个“PLAYER AND SURFACE”的滚动时间窗口的新列，例如 LAST90DAYS_PLAYER1_CLAYSERVE% 和 LAST5MATCHES_PLAYER1_CLAYSERVE%。 Note that those two fields should be for the same SURFACE specified in the subject matter record.请注意，这两个字段应针对主题记录中指定的相同 SURFACE。

09DEC2020 | 2020 年 12 月 9 日 | Mike |迈克 | Jim |吉姆 | Rome Open |罗马公开赛 | Clay |粘土 | 65% | 65% | 70% | 70% | 62.5% | 62.5% | 69.2% 69.2%

Is there an elegant Pandas command that can compute this type of time-window based stats/features for each row of data?是否有一个优雅的 Pandas 命令可以为每行数据计算这种基于时间窗口的统计/特征？ Or do I need to code a Python function from scratch with proper loops plus if/then-else logic?或者我是否需要使用适当的循环加上 if/then-else 逻辑从头开始编写 Python function ？

I have more experience with SQL so my inclination is to issue multiple "group by" queries to compute each new column separately and join a bunch of tables, in the end, to arrive at the final table/dataset.我对 SQL 有更多的经验，所以我倾向于发出多个“分组依据”查询来分别计算每个新列并连接一堆表，最后到达最终表/数据集。 So a multi-step process instead of an elegant single line of Pandas code with a built-in loop.因此，这是一个多步骤的过程，而不是带有内置循环的优雅的单行 Pandas 代码。

Thanks in advance!提前致谢！

Answer 1

You may use the pandas.DataFrame.rolling method.您可以使用pandas.DataFrame.rolling方法。 Check out the documentation .查看文档。

To provide you an example, suppose that you are working with Apple stock price time series.举个例子，假设您正在使用 Apple 股票价格时间序列。 Here is how the code would look like in order to compute the 5-day mean.下面是计算 5 天平均值的代码的样子。 Of course, you may chain other metrics such as the sum, or the standard deviation:当然，您可以链接其他指标，例如总和或标准差：

>>> aapl = data[['AAPL']].copy()
>>> aapl
                  AAPL
Date                  
2010-01-04   30.572857
2010-01-05   30.625713
2010-01-06   30.138571
2010-01-07   30.082857
2010-01-08   30.282858
                ...
2018-12-25  152.000000
2018-12-26  157.169998
2018-12-27  156.149994
2018-12-28  156.229996
2018-12-31  156.229996
[2346 rows x 1 columns]

>>> aapl['mean_5d'] = aapl.loc[:, ['AAPL']].rolling(5).mean()
>>> aapl
                  AAPL     mean_5d
Date                              
2010-01-04   30.572857         NaN
2010-01-05   30.625713         NaN
2010-01-06   30.138571         NaN
2010-01-07   30.082857         NaN
2010-01-08   30.282858   30.340571
                ...         ...
2018-12-25  152.000000  153.456000
2018-12-26  157.169998  152.712000
2018-12-27  156.149994  152.575998
2018-12-28  156.229996  153.675998
2018-12-31  156.229996  155.555997
[2346 rows x 2 columns]

>>> aapl['std_5d'] = aapl.loc[:, ['AAPL']].rolling(5).std()
>>> aapl
                  AAPL     mean_5d    std_5d
Date                                        
2010-01-04   30.572857         NaN       NaN
2010-01-05   30.625713         NaN       NaN
2010-01-06   30.138571         NaN       NaN
2010-01-07   30.082857         NaN       NaN
2010-01-08   30.282858   30.340571  0.247898
                ...         ...       ...
2018-12-25  152.000000  153.456000  5.479579
2018-12-26  157.169998  152.712000  4.355022
2018-12-27  156.149994  152.575998  4.202209
2018-12-28  156.229996  153.675998  4.316487
2018-12-31  156.229996  155.555997  2.031717
[2346 rows x 3 columns]

I hope this helps you to write more efficient code using pandas library!我希望这可以帮助您使用 pandas 库编写更高效的代码！

滚动时间窗口功能 — 使用 Pandas 进行数据整理

问题描述

1 个解决方案

解决方案1
0 2020-12-11 14:43:08

滚动时间窗口功能 — 使用 Pandas 进行数据整理

问题描述

1 个解决方案

解决方案1 0 2020-12-11 14:43:08

解决方案1
0 2020-12-11 14:43:08