I want to create a linear regression model to predict an output that uses two different coefficients based on some threshold within the data. For example: df:
Value Temperature
8.2 70
3.2 51
5.8 54
7.2 61
and so on. For this data, I want to figure out how to make the following model:
Value = B0 + B1(HighTemp) + B2(LowTemp)
Where B1 is 0 if the temperature is below 55, and B2 is 0 is the temperature is above 55. I tried the following:
fit = lm(Value ~ I(Temperature > 55), data = df)
fit2 = lm(Value ~ Temperature * I(Temperature > 55), data = df)
fit
only gives me a coefficient for when the temperature is above 55, and fit2
gives output that I don't fully understand. I was also thinking of creating a third column, HighorLow
, with an indicator variable (1 or 0) for whether or not the temperature is high or low. The I would have:
fit = lm(Value ~ Temperature:HighorLow, data = df)
Does anyone have any input? I would appreciate any help.
You have two continuous variables, why do you want to use a threshold? Your linear regression could just be
df<-data.frame(Value=c(8.2,3.2,5.8,7.2),Temperature=c(70,51,54,61))
lm(Value~Temperature,data=df)
But if you really want to split into groups based on a threshold,
df$Temp_threshold<-df$Temperature>55
lm(Value ~ Temp_threshold,data=df)
Here is an example of your third idea, which is the statistically appropriate one. You were correct to factor it.
> df <- data.frame(Value = runif(100, min = 0, max = 10), Temperature = runif(100, min = 50, max = 90))
> df$Threshold <- with(df, factor(ifelse(Temperature > 55, 1, 0)))
> m <- lm(Value ~ Threshold, data = df)
> summary(m)
Call:
lm(formula = Value ~ Threshold, data = df)
Residuals:
Min 1Q Median 3Q Max
-4.9916 -2.1260 0.1069 2.4733 4.8550
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.4835 0.8155 6.724 1.19e-09 ***
Threshold1 -0.7074 0.8645 -0.818 0.415
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.705 on 98 degrees of freedom
Multiple R-squared: 0.006787, Adjusted R-squared: -0.003347
F-statistic: 0.6697 on 1 and 98 DF, p-value: 0.4151
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.