简体   繁体   中英

Turning coefficients on and off R linear regression

I want to create a linear regression model to predict an output that uses two different coefficients based on some threshold within the data. For example: df:

Value   Temperature
 8.2     70
 3.2     51
 5.8     54
 7.2     61

and so on. For this data, I want to figure out how to make the following model:

Value = B0 + B1(HighTemp) + B2(LowTemp)

Where B1 is 0 if the temperature is below 55, and B2 is 0 is the temperature is above 55. I tried the following:

fit = lm(Value ~ I(Temperature > 55), data = df)
fit2 = lm(Value ~ Temperature * I(Temperature > 55), data = df)

fit only gives me a coefficient for when the temperature is above 55, and fit2 gives output that I don't fully understand. I was also thinking of creating a third column, HighorLow , with an indicator variable (1 or 0) for whether or not the temperature is high or low. The I would have:

fit = lm(Value ~ Temperature:HighorLow, data = df)

Does anyone have any input? I would appreciate any help.

You have two continuous variables, why do you want to use a threshold? Your linear regression could just be

df<-data.frame(Value=c(8.2,3.2,5.8,7.2),Temperature=c(70,51,54,61))
lm(Value~Temperature,data=df)

But if you really want to split into groups based on a threshold,

df$Temp_threshold<-df$Temperature>55
lm(Value ~ Temp_threshold,data=df)

Here is an example of your third idea, which is the statistically appropriate one. You were correct to factor it.

> df <- data.frame(Value = runif(100, min = 0, max = 10), Temperature = runif(100, min = 50, max = 90))
> df$Threshold <- with(df, factor(ifelse(Temperature > 55, 1, 0)))
> m <- lm(Value ~ Threshold, data = df)
> summary(m)

Call:
lm(formula = Value ~ Threshold, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9916 -2.1260  0.1069  2.4733  4.8550 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.4835     0.8155   6.724 1.19e-09 ***
Threshold1   -0.7074     0.8645  -0.818    0.415    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.705 on 98 degrees of freedom
Multiple R-squared:  0.006787,  Adjusted R-squared:  -0.003347 
F-statistic: 0.6697 on 1 and 98 DF,  p-value: 0.4151    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM