简体   繁体   English

手动编码的Poisson对数似然函数对于交互式模型返回的结果与glm不同

[英]Manually coded Poisson log likelihood function returns a different result from glm for interactive models

I've coded my own Poisson likelihood function, but it is returning values that are significantly different from glm for a model with an interaction for a specific data. 我已经编写了自己的泊松似然函数,但是它返回的值与模型的glm以及与特定数据的交互作用显着不同。 Notice that the function spits out exactly the same result as glm from all other data I've tried, as well as for the model without the interaction for this data. 请注意,该函数从我尝试过的所有其他数据以及没有交互该数据的模型中得出的结果与glm完全相同。

> # Log likelihood function
> llpoi = function(X, y){
+   # Ensures X is a matrix
+   if(class(X) != "matrix") X = as.matrix(X)
+   # Ensures there's a constant
+   if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)  
+   # A useful scalar that I'll need below
+   k = ncol(X)
+   ## Function to be maximized
+   FUN = function(par, X, y){
+     # beta hat -- the parameter we're trying to estimate
+     betahat = par[1:k]
+     # mu hat -- the systematic component
+     muhat = X %*% betahat
+     # Log likelihood function
+     sum(muhat * y - exp(muhat))
+   }
+   # Optimizing
+   opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+   # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
> 
> # Defining inputs 
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
> 
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
      [,1] [,2]
[1,]  0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,]  0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
            Estimate Std. Error
(Intercept)     1.08       0.90
x1             -0.07       0.09
x2             -0.03       0.08
x1:x2           0.00       0.01

Is this something data specific? 这是数据特定的吗? Is it inherent to the optimization process, which will eventually diverge more significantly from glm and I just got unlucky with this data? 它是否是优化过程所固有的,最终将与glm产生更大的差异,而我只是对这些数据感到不满意? Is it a function of using method = "BFGS" for optim? 它是使用method =“ BFGS”进行优化的功能吗?

By rescaling the right-hand side variables, the outcome improves a lot. 通过调整右侧变量的比例,结果可以改善很多。

> library(data.table)
> setDT(tmp)
> tmp[, x1 := scale(x1)][, x2 := scale(x2)]
> 
> 
> withmyfun = with(tmp, llpoi(cbind(x1, x2, x1 * x2), y))
> withmyfun
[,1]      [,2]
[1,]  0.57076392 0.1124637
[2,] -0.19620040 0.1278070
[3,] -0.01509032 0.1169019
[4,]  0.05636459 0.1380611
> 
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson", data = tmp)
> summary(withglm)$coef[, 1:2]
Estimate Std. Error
(Intercept)  0.57075132  0.1124641
x1          -0.19618199  0.1278061
x2          -0.01507467  0.1169034
x1:x2        0.05636934  0.1380621
> 

So, my recommendation is, inside llpoi , to have a procedure to normalize the variables before using optim to the data and rescale the estimates based before the function returns the value. 因此,我的建议是,在llpoi内部,有一个过程可以对变量进行归一化,然后再对数据进行optim ,并在函数返回值之前重新调整估算值。 Your example data have too big range, which results in very small estimates of coefficients. 您的示例数据的范围太大,导致系数的估算值很小。 This problem gets worse because of the relatively flat likelihood surface because of insignificant variables. 由于变量无关紧要,由于相对平坦的可能性表面,此问题变得更加严重。

Note: 注意:

You can get very close outputs from this except for the intercept. 除了拦截,您可以从中获得非常接近的输出。 What I meant by standardizing is something like that. 我所说的标准化就是这样。

  llpoi = function(X, y){
  # Ensures X is a matrix
  if(class(X) != "matrix") X = as.matrix(X)
  # Ensures there's a constant
  if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)  
  # A useful scalar that I'll need below
  avgs <- c(0, apply(X[, 2:ncol(X)], 2, mean))
  sds <- c(1, apply(X[, 2:ncol(X)], 2, sd))
  X<- t((t(X) - avgs)/sds)

  k = ncol(X)
  ## Function to be maximized
  FUN = function(par, X, y){
    # beta hat -- the parameter we're trying to estimate
    betahat = par[1:k]
    # mu hat -- the systematic component
    muhat = X %*% betahat
    # Log likelihood function
    sum(muhat * y - exp(muhat))
  }
  # Optimizing

  opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
  # Results, including getting the SEs from the hessian
  cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))/sds
}

After much research, I learned that the two results differ because glm.fit, the workhorse behind glm optimizes the function through Newton-Raphson method, while I used BFGS in my llpoi function. 经过大量研究,我发现这两个结果是不同的,因为glm.fit是glm背后的主力军,它通过牛顿-拉夫森方法优化了函数,而我在llpoi函数中使用了BFGS。 BFGS is faster, but less precise. BFGS更快,但精度较低。 The two results will be very similar on most cases, but may differ more significantly when the surface area is too flat or has too many maxima, as correctly pointed out by amatsuo_net, because the climbing algorithm used by BFGS will get stuck. 两种结果在大多数情况下将非常相似,但是当表面积太平坦或具有太多最大值(如amatsuo_net正确指出)时,这两个结果可能会有更大的不同,因为BFGS使用的攀爬算法会卡住。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM