简体   繁体   English

使用泊松似然的期望最大化 function

[英]Expectation Maximization using a Poisson likelihood function

I am trying to apply the expectation-maximization algorithm to estimate missing count data but all the packages in R, such as missMethods, assume a multivariate Gaussian distribution.我正在尝试应用期望最大化算法来估计丢失的计数数据,但 R 中的所有包(例如 missMethods)都假定多元高斯分布。 How would I apply the expectation-maximization algorithm to estimate missing count data assuming a Poisson distribution?假设服从泊松分布,我将如何应用期望最大化算法来估计缺失的计数数据?

Say we have data that look like this:假设我们有这样的数据:

x <- c(100,  96,  79, 109, 111,  NA,  93,  95, 119,  90, 121,  96,  NA,  
       NA,  85,  95, 110,  97,  87, 104, 101,  87,  87,  NA,  89,  NA, 
       113,  NA,  95,  NA, 119, 115,  NA, 105,  NA,  80,  90, 108,  90,  
       99, 111,  93,  99,  NA,  87,  89,  87, 126, 101, 106)

Applying impute_EM using missMethods ( missMethods::impute_EM(x, stochastic = FALSE) ) gives an answer but the data are not continuous but discrete.使用 missMethods ( missMethods::impute_EM(x, stochastic = FALSE) ) 应用 impute_EM 给出了答案,但数据不是连续的而是离散的。

I understand that questions like these require a minimum, reproducible example, but I honestly do not know where to start.我知道像这样的问题需要一个最小的、可重现的例子,但我真的不知道从哪里开始。 Even suggested reading to point me in the right direction would be helpful.甚至建议阅读以指出正确的方向也会有所帮助。

Defining x0 :定义x0

x0 <- x[!is.na(x)]

The Jeffreys/reference prior for a Poisson distribution with mean lambda is 1/sqrt(lambda) .均值为lambda的泊松分布的 Jeffreys/reference prior 是1/sqrt(lambda) From the observed values, this results in lambda having a gamma reference posterior with a shape parameter sum(x0) + 0.5 and a rate parameter 1/length(x0) .根据观察值,这导致lambda具有伽马参考后验,形状参数sum(x0) + 0.5和速率参数1/length(x0) You could take n samples of lambda with:您可以使用以下方法lambdan样本:

lambda <- rgamma(n, sum(x0) + 0.5, length(x0))

Then sample n missing values ( xm ) with然后用n缺失值 ( xm ) 采样

xm <- rpois(n, lambda)

Alternatively, since a Gamma-Poisson compound distribution can be formulated as a negative binomial (after integrating out lambda ):或者,由于 Gamma-Poisson 复合分布可以表示为负二项式(在积分出lambda之后):

xm <- rnbinom(n, sum(x0) + 0.5, length(x0)/(length(x0) + 1L))

As a function:作为 function:

MI_poisson <- function(x, n) {
  x0 <- x[!is.na(x)]
  rbind(matrix(x0, ncol = n, nrow = length(x0)),
        matrix(rnbinom(n*(length(x) - length(x0)), sum(x0) + 0.5, length(x0)/(length(x0) + 1L)), ncol = n))
}

This will return a matrix with n columns where each column contains the original vector x with all NA values imputed.这将返回一个包含n列的矩阵,其中每列包含原始向量x并估算了所有NA值。 Each column could be used separately in further analysis, then the results can be aggregated.每列可以单独用于进一步分析,然后可以汇总结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM