[英]Applying a half-gaussian filter to binned time series data in python
I am binning some time series data, I need to apply a half-normal filter to the binned data.我正在对一些时间序列数据进行分箱,我需要对分箱数据应用半正态过滤器。 How can I do this in python?我怎样才能在python中做到这一点? I've provided a toy example bellow.我在下面提供了一个玩具示例。 I need Xbinned to be smoothed with a half-gaussian filter with std of 0.25 (or what ever).我需要使用标准为 0.25(或其他任何值)的半高斯滤波器对 Xbinned 进行平滑处理。 I'm pretty sure the half gaussian should be facing the forward time direction.我很确定半高斯应该面向正向时间方向。
import numpy as np
X = np.random.randint(2, size=100) #example random process
bin_size = 5
Xbinned = []
for i in range(0, len(X)+1, bin_size):
Xbinned.append(sum(X[i:i+(bin_size-1)])/bin_size)
Scipy has a function called scipy.ndimage.gaussian_filter()<\/a> . Scipy 有一个名为scipy.ndimage.gaussian_filter()<\/a>的函数。 It nearly implements what we want here.它几乎实现了我们在这里想要的。 Unfortunately, there's no option to use a half-gaussian instead of a gaussian.不幸的是,没有选择使用半高斯而不是高斯。 However, scipy is open-source, so we can just take the source code<\/a> and modify it to be a half-gaussian.但是,scipy 是开源的,所以我们可以直接获取源代码<\/a>并将其修改为半高斯。
import scipy.ndimage def halfgaussian_kernel1d(sigma, radius): """ Computes a 1-D Half-Gaussian convolution kernel. """ sigma2 = sigma * sigma x = np.arange(0, radius+1) phi_x = np.exp(-0.5 \/ sigma2 * x ** 2) phi_x = phi_x \/ phi_x.sum() return phi_x def halfgaussian_filter1d(input, sigma, axis=-1, output=None, mode="constant", cval=0.0, truncate=4.0): """ Convolves a 1-D Half-Gaussian convolution kernel. """ sd = float(sigma) # make the radius of the filter equal to truncate standard deviations lw = int(truncate * sd + 0.5) weights = halfgaussian_kernel1d(sigma, lw) origin = -lw \/\/ 2 return scipy.ndimage.convolve1d(input, weights, axis, output, mode, cval, origin)<\/code><\/pre>
A short summary of how this works:这是如何工作的简短摘要:
- First, it generates a convolution kernel.首先,它生成一个卷积核。 It uses the formula
e^(-1\/2 * (x\/sigma)^2)<\/code> to generate the gaussian distribution.
它使用公式
e^(-1\/2 * (x\/sigma)^2)<\/code>来生成高斯分布。
It keeps going until you're 4 standard deviations away from the center.它一直持续到距离中心 4 个标准差为止。<\/li>
Next, it convolves that kernel against your signal.接下来,它将内核与您的信号进行卷积。 It adjusts the kernel to start at the current timestep instead of being centered on the current timestep.它将内核调整为从当前时间步开始,而不是以当前时间步为中心。<\/li><\/ol>
Trying this on your signal, I get a result like this:在你的信号上尝试这个,我得到这样的结果:
array([0.59979879, 0.6 , 0.40006707, 0.59993293, 0.79993293, 0.40013414, 0.20006707, 0.59986586, 0.40006707, 0.4 , 0.99979879, 0.00033535, 0.59979879, 0.40006707, 0.00013414, 0.59979879, 0.20013414, 0.00006707, 0.19993293, 0.59986586])<\/code><\/pre>
Choice of standard deviation标准差的选择<\/h2>
If you pick a standard deviation of 0.25, that is going to have almost no effect on your signal.如果您选择 0.25 的标准偏差,那对您的信号几乎没有影响。 Here are the convolution weights it uses:
[0.99966465 0.00033535]<\/code> .
以下是它使用的卷积权重:
[0.99966465 0.00033535]<\/code> 。
In other words, this has less than a 0.1% effect on the signal.换句话说,这对信号的影响不到 0.1%。
I'd recommend using a larger sigma value.我建议使用更大的 sigma 值。
Off by one error因一个错误而关闭<\/h2>
Also, I want to point out the off-by-one error here:另外,我想在这里指出一个错误:
for i in range(0, len(X)+1, bin_size): Xbinned.append(sum(X[i:i+(bin_size-1)])\/bin_size)<\/code><\/pre>
Numpy ranges are not inclusive, so a range of
i<\/code> to
i+(bin_size-1)<\/code> actually captures 4 elements, not 5.
Numpy 范围不包含在内,因此
i<\/code>到
i+(bin_size-1)<\/code>的范围实际上捕获 4 个元素,而不是 5 个。
To fix this, you can change it to this:要解决此问题,您可以将其更改为:
for i in range(0, len(X), bin_size): Xbinned.append(X[i:i+bin_size].mean())<\/code><\/pre>
(Also, I fixed an off-by-one error in the loop specification and used a numpy shortcut for finding the mean.) (另外,我修复了循环规范中的一个错误,并使用了一个 numpy 快捷方式来查找平均值。)
"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.