简体   繁体   English

如何估计噪声层背后的高斯分布?

[英]How to estimate gaussian distributions behind a noise layer?

So I have this histogram of my 1-D data which contains some transition times in seconds. 所以我有这个1-D数据的直方图,其中包含一些以秒为单位的转换时间。 The data contain a lot of noise but behind the noise lies some peaks/gaussians which are describing the correct time values. 数据包含大量噪声,但噪声背后是一些描述正确时间值的峰值/高斯数。 (See images) (见图片)

The data is retrieved from the transition time of people walking between two locations with different speeds taken from a normal walking speed distribution(mean on 1.4m/s). 从在两个位置之间行走的人的过渡时间检索数据,其具有从正常步行速度分布(平均1.4m / s)获得的不同速度。 Sometimes, there could be multiple paths between two locations which could generate multiple gaussians. 有时,两个位置之间可能存在多条路径,可能会产生多个高斯。

I want to extract the underlying gaussians which are shown above the noise. 我想提取出现在噪音之上的基础高斯。 However, since the data could come from different scenarios but with an arbitrary number (say around 0-3) of correct paths/'gaussians' I can't really use a GMM(Gaussian Mixture Model) because that would require me to know the number of gaussian components?. 但是,由于数据可能来自不同的场景,但是任意数字(比如说大约0-3)的正确路径/“高斯”,我不能真正使用GMM(高斯混合模型),因为这需要我知道高斯分量的数量?

I assume/know that the correct transition time distributions are gaussian while the noise comes from some other distribution(Chi-squared?). 我假设/知道正确的转换时间分布是高斯分布,而噪声来自其他分布(卡方?)。 I'm quite new to the topic so I might be totally wrong. 我对这个话题很陌生,所以我可能完全错了。

Since I know the ground truth distance between the two points beforehand I know where the means should be located. 由于我事先知道两点之间的地面真实距离,因此我知道手段的位置。

This image has two correct gaussians with the means on 250s and 640s . 这张图片有两个正确的高斯,有250s640s的平均值。 (The variance becomes higher on longer times ) (变化越大,方差越大)

在此输入图像描述

This image has one correct gaussian with the mean on 428s . 该图像有一个正确的高斯,平均值为428s 在此输入图像描述

Question: Is there some good approach to retrieve the gaussians or at least significantly reduce the noise given something like the above data? 问题:在上述数据之类的情况下,是否有一些很好的方法来检索高斯人或至少显着降低噪音? I don't expect to catch the gaussians that are drown in noise. 我不希望看到那些淹没在噪音中的高斯人。

I would approach this using Kernel Density Estimation . 我会使用核密度估计来解决这个问题。 I allows you to estimate the probability density directly from data, without too many assumptions about the underlying distribution. 我允许您直接从数据中估计概率密度,而不需要对基础分布进行太多假设。 By changing the kernel bandwidth you can control how much smoothing you apply, which I assume could be tuned manually by visual inspection until you get something that meets your expectations. 通过更改内核带宽,您可以控制应用的平滑程度,我认为可以通过目视检查手动调整,直到您得到满足您期望的内容。 An example of KDE implementation in python using scikit-learn can be found here . 可以在此处找到使用scikit-learn在python中实现KDE的示例。

Example: 例:

import numpy as np
from sklearn.neighbors import KernelDensity

# x is your original data
x = ...
# Adjust bandwidth to get the smoothness to your liking
bandwidth = ...

kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(x)
support = np.linspace(min(x), max(x), 1000)
density = kde.score_samples(support)

Once the filtered distribution is estimated, you can analyze that and identify the peaks using something like this . 一旦过滤分布估计,你可以分析和使用类似识别峰

from scipy.signal import find_peaks

# You can tweak with the other arguments of the 'find_peaks' function
# in order to fine-tune the extracted peaks according to your PDF
peaks = find_peaks(density)

Disclaimer : This is a more or less high level answer, since your question was also high level. 免责声明 :这是一个或多或少的高级答案,因为您的问题也很高。 I assume you know what you are doing code-wise and are just looking for ideas. 我假设你知道你在做什么代码,只是在寻找想法。 But if you need help with anything specific please show us some code and what you have tried so far so we can be more specific. 但是,如果您需要任何具体的帮助,请向我们展示一些代码以及您迄今为止所尝试的内容,以便我们可以更加具体。

I would advice to take a look at Gaussian Mixture Estimation 我建议看看高斯混合估计

https://scikit-learn.org/stable/modules/mixture.html#gmm https://scikit-learn.org/stable/modules/mixture.html#gmm

"A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters." “高斯混合模型是一种概率模型,它假设所有数据点均来自有限数量的具有未知参数的高斯分布的混合。”

You can do this using Kernel Density Estimation as pointed out by @Pasa. 你可以使用@Pasa指出的核密度估计来做到这一点。 scipy.stats.gaussian_kde can do this easily. scipy.stats.gaussian_kde可以轻松完成。 The syntax is shown in the example below, which generates 3 Gaussian distributions, superimposes them, and adds some noise then uses gaussian_kde to estimate the Gaussian curve and then plots everything for demonstration. 语法如下例所示,它生成3个高斯分布,叠加它们,然后添加一些噪声,然后使用gaussian_kde估计高斯曲线,然后绘制一切用于演示。

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats.kde import gaussian_kde

# Create three Gaussian curves and add some noise behind them
norm1 = np.random.normal(loc=10.0, size=5000, scale=1.1)
norm2 = np.random.normal(loc=5.0, size=3000)
norm3 = np.random.normal(loc=14.0, size=1000)
noise = np.random.rand(8000)*18
norm = np.concatenate((norm1, norm2, norm3, noise))

# The plotting is purely for demonstration
fig = plt.figure(dpi=300, figsize=(10,6))
plt.hist(norm, facecolor=(0, 0.4, 0.8), bins=200, rwidth=0.8, normed=True, alpha=0.3)
plt.xlim([0.0, 18.0])

# This is the relevant part, modifier modifies the estimation,
# lower values follow the data more closesly, higher more loosely
modifier= 0.03
kde = gaussian_kde(norm, modifier)

# Plots the KDE output for demonstration
kde_x = np.linspace(0, 18, 10000)
plt.plot(kde_x, kde(kde_x), 'k--', linewidth = 1.0)
plt.title("KDE example", fontsize=17)
plt.show()

高斯KDE的例子

You will note that the estimation is strongest for the most pronounced Gaussian peak centered at 10.0 , as you would expect. 您会注意到,正如您所期望的那样,对于以10.0为中心的最明显的高斯峰值,估计是最强的。 The 'sharpness' of the estimation can be modified by changing the modifier variable (which in the example modifies the kernel bandwidth), passed to the gaussian_kde constructor. 可以通过更改modifier变量(在示例中修改内核带宽)来修改估计的“锐度”,并将其传递给gaussian_kde构造函数。 Lower values will produce 'sharper' estimation and higher values produce a 'smoother' estimate. 较低的值将产生“更清晰”的估计,而较高的值产生“更平滑”的估计。 Also note that gaussian_kde returns the normalized values. 另请注意, gaussian_kde返回标准化值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM