简体   繁体   中英

Weighted Least Squares in R

My dataset is quite big so I'm just using 10 lines of data as an example (I've worked out the answer in excel but can't replicate it in R-as i need help with the code):

constant<-c(6.10,5.12,5.04,4.97,4.89,4.89,4.87,4.87,4.88,4.99)
years.star<-c(219.87,153.69,146.19,139.35,127.27,127.27,121.91,121.91,112.28,99.98)
years.sq.star<-c(7915.41,4610.71,4239.78,3901.93,3309.27,3309.27,3047.95,3047.95,2582.58,1999.62)
ln.salary<-c(28.43,23.12,21.59,21.44,22.71,23.33,20.29,21.76,21.48,22.92)

try<-data.frame(constant,years.star,years.sq.star,ln.salary)

Ln.salary is the dependant variable. The answer you should get is:

intercept-  6.474922
beta1-      -0.15026
beta2-      0.002769

My problem is that in R, if I use the lm function, it does not know that my intercept has the values above. it just uses 1,1,1,1,1,1,1,1,1,1 instead of 6.10,5.12,etc

So test<-lm(ln.salary~years.star+years.sq.star,data=try,weights=constant)

does not work because it will just generate this answer:

intercept-   207.1706
beta1-       -3.13214
beta2-        0.064416

In essence, I've taken data and tried to adjust for heteroscedasticity. In the final step, I have my constant star and my transformed x variables. The last step is to regress ln.salary on the constant and x variables to give me the answer you should get as per above.

I can do it in excel but not in R and I know I'm not getting the code right. I know the lm function which generates intercept (1,1,1...) is the problem. Please would you help.

Kind regards D

If you want to "fix" an intercept at a particular constant, you should subtract the value of that constant from the response, and then fit a no-intercept model. For example

test <- lm( ln.salary - 6.474922 ~ years.star + years.sq.star + 0,
    data=try, weights=constant)

Here we subtract off the intercept term, and then we add +0 to the formula to indicate not to fit an intercept term. With that model I get

Call:
lm(formula = ln.salary - 6.474922 ~ years.star + years.sq.star + 
    0, data = try, weights = constant)

Coefficients:
   years.star  years.sq.star  
     0.197384      -0.002842  

If you want varying "intercepts" for each row, then you need to use an 'offset' rather than a 'weight':

 test<-lm(ln.salary~years.star+years.sq.star+0,data=try,offset=constant)

Call:
lm(formula = ln.salary ~ years.star + years.sq.star + 0, data = try, 
    offset = constant)

Coefficients:
   years.star  years.sq.star  
     0.236355      -0.003881  

I'm not so impressed with the fact that this doesn't agree with Excel. That program's linear regression program is known to be rather flakey. If on the other hand you are sure you need to use weights, then you should clarify which of the three different possible interpretations of the term is being used. (Choices: replication, sampling, inverse variance). The lm interpretation of a "weight" is the inverse variance version. (It is described in its help page as being "inversely proportional to the variance), so if those "constant"-terms are variances, then perhaps you want:

> (test<-lm(ln.salary~years.star+years.sq.star+0, data=try, weights=1/constant) )

Call:
lm(formula = ln.salary ~ years.star + years.sq.star + 0, data = try, 
    weights = 1/constant)

Coefficients:
   years.star  years.sq.star  
     0.309391      -0.005189  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM