简体   繁体   English

线性回归比较多个观察值与单个观察值

[英]linear regression r comparing multiple observations vs single observation

Based upon answers of my question , I am supposed to get same values of intercept and the regression coefficient for below 2 models. 根据我的问题的答案,对于下面两个模型,我应该获得相同的拦截和回归系数值。 But they are not the same. 但是它们并不相同。 What is going on? 到底是怎么回事?

is something wrong with my code? 我的代码有问题吗? Or is the original answer wrong? 还是原始答案有误?

#linear regression average qty per price point vs all quantities

x1=rnorm(30,20,1);y1=rep(3,30)
x2=rnorm(30,17,1.5);y2=rep(4,30)
x3=rnorm(30,12,2);y3=rep(4.5,30)
x4=rnorm(30,6,3);y4=rep(5.5,30)
x=c(x1,x2,x3,x4)
y=c(y1,y2,y3,y4)
plot(y,x)
cor(y,x)
fit=lm(x~y)
attributes(fit)
summary(fit)

xdum=c(20,17,12,6)
ydum=c(3,4,4.5,5.5)
plot(ydum,xdum)
cor(ydum,xdum)
fit1=lm(xdum~ydum)
attributes(fit1)
summary(fit1)


> summary(fit)

Call:
lm(formula = x ~ y)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.3572 -1.6069 -0.1007  2.0222  6.4904 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.0952     1.1570   34.65   <2e-16 ***
y            -6.1932     0.2663  -23.25   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.63 on 118 degrees of freedom
Multiple R-squared:  0.8209,    Adjusted R-squared:  0.8194 
F-statistic: 540.8 on 1 and 118 DF,  p-value: < 2.2e-16

> summary(fit1)

Call:
lm(formula = xdum ~ ydum)

Residuals:
      1       2       3       4 
-0.9615  1.8077 -0.3077 -0.5385 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  38.2692     3.6456  10.497  0.00895 **
ydum         -5.7692     0.8391  -6.875  0.02051 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.513 on 2 degrees of freedom
Multiple R-squared:  0.9594,    Adjusted R-squared:  0.9391 
F-statistic: 47.27 on 1 and 2 DF,  p-value: 0.02051

You are not calculating xdum and ydum in a comparable fashion because rnorm will only approximate the mean value you specify, particularly when you are sampling only 30 cases. 您不会以可比的方式计算xdumydum ,因为rnorm仅会近似于您指定的平均值,尤其是在仅采样30个案例的情况下。 This is easily fixed however: 但是,这很容易解决:

coef(fit)
#(Intercept)           y 
#  39.618472   -6.128739 

xdum <- c(mean(x1),mean(x2),mean(x3),mean(x4))
ydum <- c(mean(y1),mean(y2),mean(y3),mean(y4))
coef(lm(xdum~ydum))
#(Intercept)        ydum 
#  39.618472   -6.128739 

In theory they should be the same if (and only if) the mean of the former model is equal to the point in the latter model. 理论上,当(且仅当)前一个模型的均值等于后一个模型中的点时,它们应该相同。

This is not the case in your models, so the results are slightly different. 在您的模型中不是这种情况,因此结果略有不同。 For example the mean of x1 : 例如x1的平均值:

x1=rnorm(30,20,1)
mean(x1)

20.08353 20.08353

where the point version is 20. 点版本为20。

There are similar tiny differences from your other rnorm samples: 与其他rnorm样本也有类似的微小差异:

> mean(x2)
[1] 17.0451
> mean(x3)
[1] 11.72307
> mean(x4)
[1] 5.913274

Not that this really matters, but just FYI the standard nomenclature is that Y is the dependent variable and X is the independent variable, which you reversed. 并不是说这真的很重要,而是仅供参考,标准名称是Y是因变量,X是自变量,您已经将其反转。 Makes no difference of course, but just so you know. 当然没有什么区别,但是您知道。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM