如何使用NumPy / SciPy计算运动/运行/滚动任意函数（例如峰度和偏度）

Question

I am working on the time-series data. 我正在处理时间序列数据。 To get features from data I have to calculate moving mean, median, mode, slop, kurtosis, skewness etc. I am familiar with scipy.stat which provides an easy way to calculate these quantities for straight calculation. 为了从数据中获取特征，我必须计算移动平均值，中位数， scipy.stat ， scipy.stat ，峰度，偏度等。我熟悉scipy.stat ，它提供了一种简单的方法来计算这些量以进行直接计算。 But for the moving/running part, I have explored the whole internet and got nothing. 但是对于运动/跑步部分，我已经浏览了整个互联网，但一无所获。

Surprisingly moving mean, median and mode are very easy to calculate with numpy . 令人惊讶的是，移动平均数，中位数和numpy非常容易用numpy计算。 Unfortunately, there is no built-in function for calculating kurtosis and skewness. 不幸的是，没有用于计算峰度和偏度的内置函数。 If someone can help, how to calculate moving kurtosis and skewness with scipy? 如果有人可以提供帮助，如何用scipy计算运动的峰度和偏度？ Many thanks 非常感谢

Answer 1

Pandas offers a DataFrame.rolling() method which can be used, in combination with its Rolling.apply() method (ie df.rolling().apply() ) to apply an arbitrary function to the specified rolling window. 熊猫提供了一个DataFrame.rolling()方法，可以将其与Rolling.apply()方法（即df.rolling().apply() ）结合使用，以将任意函数应用于指定的滚动窗口。

If you are looking for NumPy-based solution, you could use FlyingCircus (disclaimer: I am the main author of it). 如果您正在寻找基于NumPy的解决方案，则可以使用FlyingCircus （免责声明：我是它的主要作者）。

There, you could find the following: 在那里，您可以找到以下内容：

flyingcircus.extra.running_apply() : can apply any function to a 1D array and supports weights, but it is slow; flyingcircus.extra.running_apply() ：可以将任何函数应用于一维数组并支持权重，但速度较慢；
flyingcircus.extra.moving_apply() : can apply any function supporting a axis: int parameter to a 1D array and supports weights, and it is fast (but memory-hungry); flyingcircus.extra.moving_apply() ：可以将支持axis: int参数的任何函数应用于一维数组并支持权重，并且速度很快（但需要大量内存）；
flyingcircus.extra.rolling_apply_nd() : can apply any function supporting a axis: int|Sequence[int] parameter to any ND array and it is fast (and memory-efficient), but it does not support weights. flyingcircus.extra.rolling_apply_nd() ：可以将支持axis: int|Sequence[int]参数的任何函数应用于任何ND数组，并且速度快（且节省内存），但不支持权重。

Based on your requirements, I would suggest to use rolling_apply_nd() , eg: 根据您的要求，我建议使用rolling_apply_nd() ，例如：

import numpy as np
import scipy as sp
import flyingcircus as fc

import scipy.stats


NUM = 30
arr = np.arange(NUM)

window = 4
new_arr = fc.extra.rolling_apply_nd(arr, window, func=sp.stats.kurtosis)
print(new_arr)
# [-1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
#  -1.36 -1.36 -1.36]

Of course, feel free to inspect the source code , it is open source (GPL). 当然，可以随时检查源代码，它是开源（GPL）。

EDIT 编辑

Just to get a feeling of the kind of speed we are talking about, these are the benchmarks for the solutions implemented in FlyingCircus: 只是为了了解我们正在谈论的那种速度，以下是FlyingCircus中实现的解决方案的基准：

The general approach flyingcircus.extra.running_apply() is a couple of orders of magnitude slower than either flyingcircus.extra.rolling_apply_nd() or flyingcircus.extra.moving_apply() , with the first being approx. 一般方法flyingcircus.extra.running_apply()比flyingcircus.extra.rolling_apply_nd()或flyingcircus.extra.moving_apply()慢几个数量级，第一个大约是。 one order of magnitude faster than the second. 比第二个快一个数量级。 This shows the speed price for generality or support for weighting . 这显示了通用性或权重支持的速度价格。

The above plots were obtained using the scripts from here and the following code: 上面的图是使用此处的脚本和以下代码获得的：

import scipy as sp
import flyingcircus as fc
import scipy.stats


WINDOW = 4
FUNC = sp.stats.kurtosis


def my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):
    return fc.extra.rolling_apply_nd(arr, window, func=FUNC)


def my_moving_apply(arr, window=WINDOW, func=FUNC):
    return fc.extra.moving_apply(arr, window, func)


def my_running_apply(arr, window=WINDOW, func=FUNC):
    return fc.extra.running_apply(arr, window, func)


def equal_output(a, b):
    return np.all(np.isclose(a, b))


input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_rolling_apply_nd, my_moving_apply, my_running_apply

runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=np.random.random, equal_output=equal_output,
    input_sizes=input_sizes)

plot_benchmarks(runtimes, input_sizes, labels, units='s')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=8)

Answer 2

After playing around, I have come up with a solution that is purely numpy and scipy based. 在玩耍之后，我想出了一个纯粹基于numpy和scipy的解决方案。 Of course it is using scipy.stats kurtosis and skew . 当然，它使用的是scipy.stats kurtosis和skew 。

import numpy as np
from scipy.stats import kurtosis, skew

# Window size
N = 4

# Some random data
m = np.array([2, 3, 10, 11, 0, 4, 8, 2, 5, 9])

# Running Kurtosis
def runningKurt(x, N):
    # Initilize placeholder array
    y = np.zeros((len(x) - (N - 1),))
    for i in range(len(x) - (N - 1)):

         y[i] = kurtosis(x[i:(i + N)])

    return y

# Running Kurtosis

def runningSkew(x, N):
    # Initilize placeholder array
    y = np.zeros((len(x) - (N - 1),))
    for i in range(len(x) - (N - 1)):

         y[i] = skew(x[i:(i + N)])

    return y

kurt = runningKurt(m, N)
print("kurtosis : ", kurt)
# kurtosis :  [-1.93940828 -1.77879935 -1.61464214 -1.40236694 -1.15428571 -1.07626667 -1.42666667]


skw = runningSkew(m, N)
print("skew : ", skw)
# skew :  [ 0.         -0.1354179  -0.26356495 -0.13814702  0.43465076  0.32331615 -0.36514837]

如何使用NumPy / SciPy计算运动/运行/滚动任意函数（例如峰度和偏度）

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-07-21 23:10:52

EDIT 编辑

解决方案2
0 2019-07-22 12:45:45

如何使用NumPy / SciPy计算运动/运行/滚动任意函数（例如峰度和偏度）

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-07-21 23:10:52

EDIT 编辑

解决方案2 0 2019-07-22 12:45:45

解决方案1
0 已采纳 2019-07-21 23:10:52

解决方案2
0 2019-07-22 12:45:45