匹配线性 model 的 lm 和 optim 系数估计与乘法误差

Question

This question is a mix of math and programming, but I'm guessing the solution lay on the programming side of things.这个问题是数学和编程的混合体，但我猜解决方案在于事物的编程方面。

Suppose I have a linear model with a multiplicative error.假设我有一个带有乘法误差的线性 model。

$Y_i = (a + bx_i)\epsilon_i, \ \epsilon_i \sim LN(1, \sigma^2)$

I'd like to estimate my coefficients a and b in R.我想估计我在 R 中的系数a和b 。 I've found the solution in the top answer here and the proof seems to make sense.我在这里的最佳答案中找到了解决方案，并且证明似乎很有意义。 I've also found out how to do OLS with heteroskedasticity-robust standard errors here .我还在这里找到了如何使用异方差稳健标准错误进行 OLS。 My interpretation of the results between the two resources is that the estimated values of the coefficients in both plain-Jane OLS and heteroskedastically-robust OLS stay the same, but the t -values, F -values, and standard errors will differ.我对两种资源之间结果的解释是，普通简 OLS 和异方差稳健 OLS 中系数的估计值保持不变，但t 值、 F值和标准误差会有所不同。 However, I don't care about those, only the estimate of the coefficients.但是，我不关心那些，只关心系数的估计。 It seems to follow that if I were to log the original equation似乎遵循如果我要记录原始方程

$ln(Y_i) = ln(a + bx_i) + ln(e_i), \ ln(e_i) \sim N(0, \sigma^2)$

and then minimize the following through an optimization function in R然后通过 R 中的优化 function 最小化以下内容

$\sum(ln(Y_i) - ln(a + bx_i))^2$

then the results for the coefficients should match that of lm(y~x)$coefficients .那么系数的结果应该与lm(y~x)$coefficients的结果相匹配。 I'm not seeing that.我没有看到。 Here's my code so far.到目前为止，这是我的代码。

library(dplyr)
library(wooldridge)

# Get the data ready.

data("saving")

saving <- saving %>% filter(sav > 0, 
                            inc < 20000, 
                            sav < inc)

x = saving$inc
y = saving$sav

# Define LinearLogError and generate coefficient estimates.

LinearLogError = function(coeffs){
  a = coeffs[1]; b = coeffs[2]
  yhat = log(a + b*x)
  return(sum((log(y) - yhat)^2))
}

lmCoeffs = lm(y~x)$coefficients

startCoeffs = c(1, 1)
optimCoeffs = optim(par = startCoeffs, fn = LinearLogError)$par

# Results.

lmCoeffs
optimCoeffs

However the results are然而结果是

> lmCoeffs
(Intercept)           x 
316.1983535   0.1405155 
> optimCoeffs
[1] -237.0579080    0.1437663

So my question is am I understanding the solution correctly -- ie is my math correct?所以我的问题是我是否正确理解了解决方案——即我的数学是否正确？ If yes, then what do I need to do in R to see similar results with lmCoeffs ?如果是，那么我需要在 R 中做什么才能看到与lmCoeffs类似的结果？ In not, what don't I understand and what's the correct way to go about finding the proper coefficient estimates for my problem?不是，我不明白什么以及 go 关于为我的问题找到合适的系数估计的正确方法是什么？

*Edited: Corrected a typo in my code. *已编辑：更正了我的代码中的错字。

Answer 1

You are optimizing different least squares so there is no reason to assume they should give you the same coefficients.您正在优化不同的最小二乘，因此没有理由假设它们应该给您相同的系数。

So quoting from your first post:所以引用你的第一篇文章：

It's easy to verify now that, the thing in square brackets, conditional on, has mean zero and variance (+)22.现在很容易验证，方括号中的东西，条件是，均值为零，方差 (+)22。 So, this multiplicative errors model is just a cleverly disguised linear model with heteroskedasticity.因此，这个乘法误差 model 只是一个巧妙伪装的具有异方差性的线性 model。

This means a normal linear regression that assumes homoskedasticity (equal variance) doesn't hold.这意味着假设同方差（等方差）的正常线性回归不成立。 The second post you have, it shows another way to test you coefficients are not zero after running a normal linear regression.您拥有的第二篇文章，它显示了在运行正常线性回归后测试您的系数不为零的另一种方法。

If what you need are actually good estimates of your coefficients, you need to run a linear regression for unequal variances .如果您实际上需要的是对系数的良好估计，则需要对不等方差运行线性回归。 It is definitely not what you have in the optimized function as you don't need to divide by yhat and I am not so sure how you ensure log(ax + b) is positive.这绝对不是您在优化的 function 中所拥有的，因为您不需要除以 yhat，而且我不太确定如何确保 log(ax + b) 为正。

You can try the gls function in R together with specifying a variance structure as laid out in the quote above ( ax^2 + b):您可以尝试gls中的 gls function 并指定上面引用中列出的方差结构（ax^2 + b）：

library(nlme)
vf <-varConstPower(form =~ inc)
fit<-gls(sav ~ inc,weights = vf, data = saving)
fit

Generalized least squares fit by REML
  Model: sav ~ inc 
  Data: saving 
  Log-restricted-likelihood: -641.6587

Coefficients:
(Intercept)         inc 
177.8608409   0.1557556

匹配线性 model 的 lm 和 optim 系数估计与乘法误差

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-11 23:53:52

匹配线性 model 的 lm 和 optim 系数估计与乘法误差

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-11 23:53:52

解决方案1
2 已采纳 2020-12-11 23:53:52