[英]linear regression r comparing multiple observations vs single observation
Based upon answers of my question , I am supposed to get same values of intercept and the regression coefficient for below 2 models. 根据我的问题的答案,对于下面两个模型,我应该获得相同的拦截和回归系数值。 But they are not the same.
但是它们并不相同。 What is going on?
到底是怎么回事?
is something wrong with my code? 我的代码有问题吗? Or is the original answer wrong?
还是原始答案有误?
#linear regression average qty per price point vs all quantities
x1=rnorm(30,20,1);y1=rep(3,30)
x2=rnorm(30,17,1.5);y2=rep(4,30)
x3=rnorm(30,12,2);y3=rep(4.5,30)
x4=rnorm(30,6,3);y4=rep(5.5,30)
x=c(x1,x2,x3,x4)
y=c(y1,y2,y3,y4)
plot(y,x)
cor(y,x)
fit=lm(x~y)
attributes(fit)
summary(fit)
xdum=c(20,17,12,6)
ydum=c(3,4,4.5,5.5)
plot(ydum,xdum)
cor(ydum,xdum)
fit1=lm(xdum~ydum)
attributes(fit1)
summary(fit1)
> summary(fit)
Call:
lm(formula = x ~ y)
Residuals:
Min 1Q Median 3Q Max
-8.3572 -1.6069 -0.1007 2.0222 6.4904
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.0952 1.1570 34.65 <2e-16 ***
y -6.1932 0.2663 -23.25 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.63 on 118 degrees of freedom
Multiple R-squared: 0.8209, Adjusted R-squared: 0.8194
F-statistic: 540.8 on 1 and 118 DF, p-value: < 2.2e-16
> summary(fit1)
Call:
lm(formula = xdum ~ ydum)
Residuals:
1 2 3 4
-0.9615 1.8077 -0.3077 -0.5385
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2692 3.6456 10.497 0.00895 **
ydum -5.7692 0.8391 -6.875 0.02051 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.513 on 2 degrees of freedom
Multiple R-squared: 0.9594, Adjusted R-squared: 0.9391
F-statistic: 47.27 on 1 and 2 DF, p-value: 0.02051
You are not calculating xdum
and ydum
in a comparable fashion because rnorm
will only approximate the mean value you specify, particularly when you are sampling only 30 cases. 您不会以可比的方式计算
xdum
和ydum
,因为rnorm
仅会近似于您指定的平均值,尤其是在仅采样30个案例的情况下。 This is easily fixed however: 但是,这很容易解决:
coef(fit)
#(Intercept) y
# 39.618472 -6.128739
xdum <- c(mean(x1),mean(x2),mean(x3),mean(x4))
ydum <- c(mean(y1),mean(y2),mean(y3),mean(y4))
coef(lm(xdum~ydum))
#(Intercept) ydum
# 39.618472 -6.128739
In theory they should be the same if (and only if) the mean of the former model is equal to the point in the latter model. 理论上,当(且仅当)前一个模型的均值等于后一个模型中的点时,它们应该相同。
This is not the case in your models, so the results are slightly different. 在您的模型中不是这种情况,因此结果略有不同。 For example the mean of
x1
: 例如
x1
的平均值:
x1=rnorm(30,20,1)
mean(x1)
20.08353
20.08353
where the point version is 20. 点版本为20。
There are similar tiny differences from your other rnorm
samples: 与其他
rnorm
样本也有类似的微小差异:
> mean(x2)
[1] 17.0451
> mean(x3)
[1] 11.72307
> mean(x4)
[1] 5.913274
Not that this really matters, but just FYI the standard nomenclature is that Y is the dependent variable and X is the independent variable, which you reversed. 并不是说这真的很重要,而是仅供参考,标准名称是Y是因变量,X是自变量,您已经将其反转。 Makes no difference of course, but just so you know.
当然没有什么区别,但是您知道。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.