简体   繁体   English

Metropolis Hastings用于线性回归模型

[英]Metropolis Hastings for linear regression model

I am trying to implement the Metropolis-Hastings algorithm for a simple linear regression in C (without use of other libraries (boost, Eigen etc.) and without two-dimensional arrays)*. 我正在尝试实现Metropolis-Hastings算法,以便在C中进行简单的线性回归(不使用其他库(boost,Eigen等),并且不使用二维数组)*。 For better testing of the code/evaluation of the trace plots, I have rewritten the code for R (see below) by keeping as much of the C-code as possible. 为了更好地测试轨迹图/评估代码,我通过保留尽可能多的C代码来重写了R的代码(请参见下文)。

Unfortunately, the chains don't converge. 不幸的是,连锁店无法融合。 I am wondering if 我想知道

  1. there is a mistake in the implementation itself? 实现本身有错误吗?
  2. "just" a bad choice of proposal distributions? “只是”提案分配的错误选择?

Assuming the latter, I am thinking about how to find good parameters of proposal distributions (currently I have picked arbitrary values) so that the algorithm works. 假设是后者,我正在考虑如何找到提案分配的良好参数(当前我已经选择了任意值),以便该算法起作用。 Even with three parameters as in this case, it is quite hard to find suitable parameters. 即使在这种情况下具有三个参数,也很难找到合适的参数。 How does one normally handle this problem if say Gibbs sampling is not an alternative? 如果说不能选择吉布斯抽样,通常如何处理这个问题?

*I want to use this code for Cuda *我想将此代码用于Cuda

#### posterior distribution
logPostDensity <- function(x, y, a, b, s2, N)
{
sumSqError = 0.0
for(i in 1:N)
{
  sumSqError = sumSqError + (y[i] - (a + b*x[i]))^2
}
return(((-(N/2)+1) * log(s2)) + ((-0.5/s2) * sumSqError))

}

# x = x values
# y = actual datapoints
# N = sample size
# m = length of chain
# sigmaProp = uniform proposal for sigma squared
# paramAProp = uniform proposal for intercept
# paramBProp = uniform proposal for slope

mcmcSampling <- function(x,y,N,m,sigmaProp,paramAProp,paramBProp)
{

  paramsA = vector("numeric",length=m) # intercept
  paramsB = vector("numeric",length=m) # slope
  s2 = vector("numeric",length=m) # sigma squared

  paramsA[1] = 0
  paramsB[1] = 0
  s2[1] = 1

  for(i in 2:m)
  {

    paramsA[i] = paramsA[i-1] + runif(1,-paramAProp,paramAProp)

    if((logPostDensity(x,y,paramsA[i],paramsB[i],s2[i-1],N)
        - logPostDensity(x,y,paramsA[i-1],paramsB[i-1],s2[i-1],N))
       < log(runif(1)))
    {
      paramsA[i] = paramsA[i-1]
    }

    paramsB[i] = paramsB[i-1] + runif(1,-paramBProp,paramBProp)

    if((logPostDensity(x,y,paramsA[i],paramsB[i],s2[i-1],N)
        - logPostDensity(x,y,paramsA[i-1],paramsB[i-1],s2[i-1],N))
       < log(runif(1)))
    {
      paramsB[i] = paramsB[i-1]
    }

    s2[i] = s2[i-1] + runif(1,-sigmaProp,sigmaProp)

    if((s2[i] < 0) || (logPostDensity(x,y,paramsA[i],paramsB[i],s2[i],N)
                       - logPostDensity(x,y,paramsA[i],paramsB[i],s2[i-1],N))
       < log(runif(1)))
    {
      s2[i] = s2[i-1]
    }


  }

  res = data.frame(paramsA,paramsB,s2)
  return(res)
}


#########################################

set.seed(321)
x <- runif(100)
y <- 2 + 5*x + rnorm(100)

summary(lm(y~x))


df <- mcmcSampling(x,y,10,5000,0.05,0.05,0.05)


par(mfrow=c(3,1))
plot(df$paramsA[3000:5000],type="l",main="intercept")
plot(df$paramsB[3000:5000],type="l",main="slope")
plot(df$s2[3000:5000],type="l",main="sigma")

There was one mistake in the intercept section (paramsA). 在拦截部分(paramsA)中有一个错误。 Everything else was fine. 其他一切都很好。 I've implemented what Alexey suggested in his comments. 我已经实现了Alexey在他的评论中建议的内容。 Here's the solution: 解决方法如下:

pow <- function(x,y)
{
  return(x^y)
}


#### posterior distribution
posteriorDistribution <- function(x, y, a, b,s2,N)
{
sumSqError <- 0.0
for(i in 1:N)
{
  sumSqError <- sumSqError + pow(y[i] - (a + b*x[i]),2)
}
return((-((N/2)+1) * log(s2)) + ((-0.5/s2) * sumSqError))

}

# x <- x values
# y <- actual datapoints
# N <- sample size
# m <- length of chain
# sigmaProposalWidth <- width of uniform proposal dist for sigma squared
# paramAProposalWidth <- width of uniform proposal dist for intercept
# paramBProposalWidth <- width of uniform proposal dist for slope

mcmcSampling <- function(x,y,N,m,sigmaProposalWidth,paramAProposalWidth,paramBProposalWidth)
{

  desiredAcc <- 0.44

  paramsA <- vector("numeric",length=m) # intercept
  paramsB <- vector("numeric",length=m) # slope
  s2 <- vector("numeric",length=m) # sigma squared

  paramsA[1] <- 0
  paramsB[1] <- 0
  s2[1] <- 1

  accATot <- 0
  accBTot <- 0
  accS2Tot <- 0

  for(i in 2:m)
  {
    paramsA[i] <- paramsA[i-1] + runif(1,-paramAProposalWidth,paramAProposalWidth)
    accA <- 1
    if((posteriorDistribution(x,y,paramsA[i],paramsB[i-1],s2[i-1],N) - 
        posteriorDistribution(x,y,paramsA[i-1],paramsB[i-1],s2[i-1],N)) < log(runif(1)))
    {
      paramsA[i] <- paramsA[i-1]
      accA <- 0
    }


    accATot <- accATot + accA

    paramsB[i] <- paramsB[i-1] + runif(1,-paramBProposalWidth,paramBProposalWidth)
    accB <- 1
    if((posteriorDistribution(x,y,paramsA[i],paramsB[i],s2[i-1],N) - 
        posteriorDistribution(x,y,paramsA[i-1],paramsB[i-1],s2[i-1],N)) < log(runif(1)))
    {
      paramsB[i] <- paramsB[i-1]
      accB <- 0
    }

    accBTot <- accBTot + accB

    s2[i] <- s2[i-1] + runif(1,-sigmaProposalWidth,sigmaProposalWidth)
    accS2 <- 1

    if((s2[i] < 0) || (posteriorDistribution(x,y,paramsA[i],paramsB[i],s2[i],N) - 
                       posteriorDistribution(x,y,paramsA[i],paramsB[i],s2[i-1],N)) < log(runif(1)))
    {
      s2[i] <- s2[i-1]
      accS2 <- 0
    }

    accS2Tot <- accS2Tot + accS2

    if(i%%100==0)
    {

      paramAProposalWidth <- paramAProposalWidth * ((accATot/100)/desiredAcc)
      paramBProposalWidth <- paramBProposalWidth * ((accBTot/100)/desiredAcc)
      sigmaProposalWidth <- sigmaProposalWidth * ((accS2Tot/100)/desiredAcc)

      accATot <-  0
      accBTot <-  0 
      accS2Tot <-  0

    }


  }
    res <- data.frame(paramsA,paramsB,s2)
    return(res)

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 单线性回归分析模型误差 - Single linear regression analysis model error 如何使用Metropolis-Hastings算法将C或C ++代码合并到我的R代码中以加速MCMC程序 - how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm 线性回归的神经网络 - Neural network for linear regression 适用于iPhone / iPad的线性回归库或代码段? - Linear regression library or code snippet for iPhone/iPad? 在 C 中实现了简单的线性回归,但使用 p_thread 的速度要慢得多 - Implemented simple linear regression in C, but much slower with p_thread 从GSL库获取C gsl_fit_linear()函数中的线性回归的p值 - Getting p-value for linear regression in C gsl_fit_linear() function from GSL library 在给定协方差矩阵和拟合系数的情况下,如何计算线性回归的p值 - How do I calculate p values of a linear regression given the covariance matrix and fit coefficients C中都会函数的意外输出 - Unexpected output from metropolis function in C 如何并行化 C 中的下一个代码,以便它遍历所有行和列? (线性回归程序) - How do I parallelize the next code in C so it iterates through all rows and columns? (linear regression program) 将Rcpp与C代码链接以进行自适应大都市拒绝采样 - Link Rcpp with C code for Adaptive Metropolis rejection sampling
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM