[英]Dealing with nested variables in R linear regression
I have a dataset which includes some nested variables.我有一个包含一些嵌套变量的数据集。 For example, I have the following variables: the speed
of a car, the existence of another car following it other_car
and, if there is another car, the distance between the two cars distance
.例如,我有以下变量:一辆车的speed
,是否存在跟随它的另一辆车other_car
以及如果有另一辆车,两辆车之间的distance
。 Dummy dataset:虚拟数据集:
speed <- c(30,50,60,30,33,54,65,33,33,54,65,34,45,32)
other_car <- c(0,1,0,0,0,1,1,1,1,0,1,0,1,0)
distance <- c(NA,20,NA,NA,NA,21,5,15,17,NA,34,NA,13,NA)
dft <- data.frame(speed,other_car,distance)
I would like to include the variables other_car
and distance
in a model with the form of nested variables, ie if the car is present consider also the distance.我想以嵌套变量的形式在 model 中包含变量other_car
和distance
,即如果汽车存在,还要考虑距离。 Following an approach mentioned here: https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model , I tried the following:按照此处提到的方法: https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model ,我尝试了以下方法:
dft <- data.frame(speed,other_car,distance)
dft$other_car<-factor(dft$other_car)
lm_speed <- lm(speed ~ dft$other_car + dft$other_car:dft$distance)
summary(lm_speed)
Which gives the following error:这给出了以下错误:
Error in
contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levelscontrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]):对比只能应用于具有 2 个或更多级别的因子
Any ideas?有任何想法吗?
This is due to the fact that when other_car==0
, distances are all equal to NA
, see :这是因为当other_car==0
时,距离都等于NA
, 请参阅:
dft$distance[dft$other_car==0]
[1] NA NA NA NA NA NA NA
You could assign a constant distance to replace NA
for other_car==0
, so that the model uses the factor other_car==0
and finds out that the distance has no impact for this subset:您可以为other_car==0
分配一个恒定距离来替换NA
,以便 model 使用因子other_car==0
并发现距离对此子集没有影响:
dft$distance[dft$other_car==0]<-0
dft$other_car<- factor(dft$other_car)
lm_speed <- lm(speed ~ other_car + other_car:distance, data = dft)
summary(lm_speed)
Call:
lm(formula = speed ~ other_car + other_car:distance, data = dft)
Residuals:
Min 1Q Median 3Q Max
-16.015 -8.500 -3.876 8.894 21.000
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.0000 5.0405 7.737 8.96e-06 ***
other_car1 4.6480 13.0670 0.356 0.729
other_car0:distance NA NA NA NA
other_car1:distance 0.3157 0.6133 0.515 0.617
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.34 on 11 degrees of freedom
Multiple R-squared: 0.1758, Adjusted R-squared: 0.026
F-statistic: 1.174 on 2 and 11 DF, p-value: 0.3452
Another workaround could be to convert the factor
to numeric
, but this isn't the same model :另一种解决方法可能是将factor
转换为numeric
,但这与 model 不同:
speed <- c(30,50,60,30,33,54,65,33,33,54,65,34,45,32)
other_car <- c(0,1,0,0,0,1,1,1,1,0,1,0,1,0)
distance <- c(NA,20,NA,NA,NA,21,5,15,17,NA,34,NA,13,NA)
dft <- data.frame(speed,other_car,distance)
dft$other_car<- as.numeric(factor(dft$other_car))
lm_speed <- lm(speed ~ other_car + other_car:distance, data = dft)
summary(lm_speed)
Call:
lm(formula = speed ~ other_car + other_car:distance, data = dft)
Residuals:
2 6 7 8 9 11 13
0.03776 3.72205 19.77341 -15.38369 -16.01511 10.61782 -2.75227
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.6480 12.9010 3.383 0.0196 *
other_car NA NA NA NA
other_car:distance 0.1579 0.3281 0.481 0.6508
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.27 on 5 degrees of freedom
(7 observations deleted due to missingness)
Multiple R-squared: 0.04424, Adjusted R-squared: -0.1469
F-statistic: 0.2314 on 1 and 5 DF, p-value: 0.6508
Which tells that speeds increases with distance to other car (or the other way round, when the other car is too near, drivers tend to slow down).这表明速度随着与其他汽车的距离而增加(或者反过来,当其他汽车太近时,司机往往会放慢速度)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.