简体   繁体   English

"将半高斯滤波器应用于python中的分箱时间序列数据"

[英]Applying a half-gaussian filter to binned time series data in python

I am binning some time series data, I need to apply a half-normal filter to the binned data.我正在对一些时间序列数据进行分箱,我需要对分箱数据应用半正态过滤器。 How can I do this in python?我怎样才能在python中做到这一点? I've provided a toy example bellow.我在下面提供了一个玩具示例。 I need Xbinned to be smoothed with a half-gaussian filter with std of 0.25 (or what ever).我需要使用标准为 0.25(或其他任何值)的半高斯滤波器对 Xbinned 进行平滑处理。 I'm pretty sure the half gaussian should be facing the forward time direction.我很确定半高斯应该面向正向时间方向。

import numpy as np

X = np.random.randint(2, size=100) #example random process

bin_size =  5

Xbinned = []

for i in range(0, len(X)+1, bin_size):
    Xbinned.append(sum(X[i:i+(bin_size-1)])/bin_size)

How to implement half-gaussian filtering如何实现半高斯滤波<\/h2>

Scipy has a function called scipy.ndimage.gaussian_filter()<\/a> . Scipy 有一个名为scipy.ndimage.gaussian_filter()<\/a>的函数。 It nearly implements what we want here.它几乎实现了我们在这里想要的。 Unfortunately, there's no option to use a half-gaussian instead of a gaussian.不幸的是,没有选择使用半高斯而不是高斯。 However, scipy is open-source, so we can just take the source code<\/a> and modify it to be a half-gaussian.但是,scipy 是开源的,所以我们可以直接获取源代码<\/a>并将其修改为半高斯。

I used this source code, and removed all of the parts that are not needed for this particular case.我使用了这个源代码,并删除了这个特殊情况不需要的所有部分。 At the end, I had this:最后,我有这个:

 import scipy.ndimage def halfgaussian_kernel1d(sigma, radius): """ Computes a 1-D Half-Gaussian convolution kernel. """ sigma2 = sigma * sigma x = np.arange(0, radius+1) phi_x = np.exp(-0.5 \/ sigma2 * x ** 2) phi_x = phi_x \/ phi_x.sum() return phi_x def halfgaussian_filter1d(input, sigma, axis=-1, output=None, mode="constant", cval=0.0, truncate=4.0): """ Convolves a 1-D Half-Gaussian convolution kernel. """ sd = float(sigma) # make the radius of the filter equal to truncate standard deviations lw = int(truncate * sd + 0.5) weights = halfgaussian_kernel1d(sigma, lw) origin = -lw \/\/ 2 return scipy.ndimage.convolve1d(input, weights, axis, output, mode, cval, origin)<\/code><\/pre>

A short summary of how this works:这是如何工作的简短摘要:

  1. First, it generates a convolution kernel.首先,它生成一个卷积核。 It uses the formula e^(-1\/2 * (x\/sigma)^2)<\/code> to generate the gaussian distribution.它使用公式e^(-1\/2 * (x\/sigma)^2)<\/code>来生成高斯分布。 It keeps going until you're 4 standard deviations away from the center.它一直持续到距离中心 4 个标准差为止。<\/li>
  2. Next, it convolves that kernel against your signal.接下来,它将内核与您的信号进行卷积。 It adjusts the kernel to start at the current timestep instead of being centered on the current timestep.它将内核调整为从当前时间步开始,而不是以当前时间步为中心。<\/li><\/ol>

    Trying this on your signal, I get a result like this:在你的信号上尝试这个,我得到这样的结果:

     array([0.59979879, 0.6 , 0.40006707, 0.59993293, 0.79993293, 0.40013414, 0.20006707, 0.59986586, 0.40006707, 0.4 , 0.99979879, 0.00033535, 0.59979879, 0.40006707, 0.00013414, 0.59979879, 0.20013414, 0.00006707, 0.19993293, 0.59986586])<\/code><\/pre>

    Choice of standard deviation标准差的选择<\/h2>

    If you pick a standard deviation of 0.25, that is going to have almost no effect on your signal.如果您选择 0.25 的标准偏差,那对您的信号几乎没有影响。 Here are the convolution weights it uses: [0.99966465 0.00033535]<\/code> .以下是它使用的卷积权重: [0.99966465 0.00033535]<\/code> 。 In other words, this has less than a 0.1% effect on the signal.换句话说,这对信号的影响不到 0.1%。

    I'd recommend using a larger sigma value.我建议使用更大的 sigma 值。

    Off by one error因一个错误而关闭<\/h2>

    Also, I want to point out the off-by-one error here:另外,我想在这里指出一个错误:

     for i in range(0, len(X)+1, bin_size): Xbinned.append(sum(X[i:i+(bin_size-1)])\/bin_size)<\/code><\/pre>

    Numpy ranges are not inclusive, so a range of i<\/code> to i+(bin_size-1)<\/code> actually captures 4 elements, not 5. Numpy 范围不包含在内,因此i<\/code>到i+(bin_size-1)<\/code>的范围实际上捕获 4 个元素,而不是 5 个。

    To fix this, you can change it to this:要解决此问题,您可以将其更改为:

     for i in range(0, len(X), bin_size): Xbinned.append(X[i:i+bin_size].mean())<\/code><\/pre>

    (Also, I fixed an off-by-one error in the loop specification and used a numpy shortcut for finding the mean.) (另外,我修复了循环规范中的一个错误,并使用了一个 numpy 快捷方式来查找平均值。)

    "

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM