简体   繁体   English

多元正态分布拟合数据集

[英]Multivariate Normal Distribution fitting dataset

I was reading a few papers about RNN networks.我正在阅读一些关于 RNN 网络的论文。 At some point, I came accross the following explanations:在某些时候,我遇到了以下解释:

The prediction model trained on sN is used to compute the error vectors for each point in the validation and test sequences.在 sN 上训练的预测 model 用于计算验证和测试序列中每个点的误差向量。 The error vectors are modelled to fit a multivariate Gaussian distribution N = N (μ, Σ).对误差向量进行建模以拟合多元高斯分布 N = N (μ, Σ)。 The likelihood p(t) of observing an error vector e(t) is given by the value of N at e(t) (similar to normalized innovations squared (NIS) used for novelty detection using Kalman filter based dynamic prediction model [5]).观察误差向量 e(t) 的可能性 p(t) 由 e(t) 处的 N 值给出(类似于使用基于卡尔曼滤波器的动态预测 model [5] 进行新颖性检测的归一化创新平方 (NIS) )。 The error vectors for the points from vN1 are used to estimate the parameters μ and Σ using Maximum Likelihood Estimation.来自 vN1 的点的误差向量用于使用最大似然估计来估计参数 μ 和 Σ。

And:和:

A Multivariate Gaussian Distribution is fitted to the error vectors on the validation set.将多元高斯分布拟合到验证集上的误差向量。 y (t) is the probability of an error vector e (t) after applying Multivariate Gaussian Distribution N = N (µ, ±). y (t) 是应用多元高斯分布 N = N (µ, ±) 后误差向量 e (t) 的概率。 Maximum Likelihood Estimation is used to select the parameters µ and Σ for the points from vN.最大似然估计用于 select 参数 µ 和 Σ 用于来自 vN 的点。

vN or vN1 are validaton datasets. vN 或 vN1 是验证数据集。 sN is the training dataset. sN 是训练数据集。

They are from 2 different articles but describe the same thing.它们来自两篇不同的文章,但描述的是同一件事。 I didn't really understand what they mean by fitting a Multivariate Gaussian Distribution to the data.通过将多元高斯分布拟合到数据中,我并没有真正理解它们的含义。 What does it mean?这是什么意思?

Many thanks,非常感谢,

Guillaume纪尧姆

Let's start with one dimensional data first.让我们先从一维数据开始。 If you have a data distributed in a 1D line, they have a mean (µ) and variance (sigma).如果您的数据分布在一维线中,则它们具有均值 (µ) 和方差 (sigma)。 Then modeling them is as simple as having (µ, sigma) to generate a new data point following your main distribution.然后对它们进行建模就像使用(µ, sigma)一样简单,以根据您的主要分布生成一个新数据点。

# Generating a new_point in a 1D Gaussian distribution
import random

mu, sigma = 1, 1.6
new_point = random.gauss(mu, sigma)
# 2.797757476598497

Now in N dimensional space, multivariate normal distribution is a generalization of the one-dimensional.现在在N维空间中,多元正态分布是一维的推广。 The objective in general is to find N averages µ and N x N covariances this time noted by Σ to model all data points in the N dimensional space.通常的目标是找到N个平均值µN x N协方差,这次由Σ到 model 记下N维空间中的所有数据点。 Having them, you are able to generate as many random data points as you want following the main distributions.拥有它们,您可以根据主要分布生成任意数量的随机数据点。 In Python/ Numpy, you can do it like:在 Python/Numpy 中,您可以这样做:

import numpy as np
new_data_point = np.random.multivariate_normal(mean, covariance, 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM