I'm totally new to R and statistic as well.
Using the crabs.csv dataset I made a linear regression model using this code:
facdF = dF %>% mutate(sex = factor(sex, labels = c("F", "M")))
fac2dF = dF %>% mutate(sp = factor(sp, labels = c("O", "B")))
dfS <- summary(with(fac2dF, lm(as.numeric(gsub(",",".",CL)) ~ sex*sp)))
With dF being the original dataframe. When I run the code, I get as output for the summary of the model:
Call:
lm(formula = as.numeric(gsub(",", ".", CL)) ~ sex * sp)
Residuals:
Min 1Q Median 3Q Max
-16.430 -4.423 -0.065 5.378 14.570
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.965 1.100 25.421 < 2e-16 ***
sexM 4.565 1.587 2.876 0.00462 **
spO 6.507 1.598 4.071 7.61e-05 ***
sexM:spO -5.963 2.266 -2.631 0.00942 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.957 on 147 degrees of freedom
Multiple R-squared: 0.115, Adjusted R-squared: 0.09692
F-statistic: 6.366 on 3 and 147 DF, p-value: 0.0004364
I would basically like to ask you two questions:
Why is the estimate for sexM:spO negative?
I must extract a prediction for sexF:spO , but the interaction term is not present. How can I do it?
EDIT: As requested, the file crabs.csv is available at github
I've resolved it by just selecting factors with:
facdF = dF %>% mutate(sex = factor(sex, labels = c("F")))
fac2dF = dF %>% mutate(sp = factor(sp, labels = c("O")))
instead of:
facdF = dF %>% mutate(sex = factor(sex, labels = c("M", "F")))
fac2dF = dF %>% mutate(sp = factor(sp, labels = c("B", "O")))
The reason that sexM:spO
is negative is because that's the direction of the effect in the data. That is, individuals with male sex have 5 lower CL if they are species O, relative to species B.
library("MASS")
data(crabs)
fit <- lm(CL ~ sex * sp, data = crabs)
summary(fit)
#>
#> Call:
#> lm(formula = CL ~ sex * sp, data = crabs)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -16.988 -4.636 0.184 5.130 15.086
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 28.1020 0.9499 29.584 < 2e-16 ***
#> sexM 3.9120 1.3434 2.912 0.00401 **
#> spO 6.5160 1.3434 4.851 2.5e-06 ***
#> sexM:spO -4.8420 1.8998 -2.549 0.01158 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 6.717 on 196 degrees of freedom
#> Multiple R-squared: 0.1232, Adjusted R-squared: 0.1098
#> F-statistic: 9.181 on 3 and 196 DF, p-value: 1.031e-05
The reason sexM shows up is because by categorical variables are treated like factors in R models, and in R factors have alphabetical levels by default. To get the interaction term sexF:spO, you would have to ensure the variable being passed in has M
as the first level and F
as the second, something like:
crabs$sex <- factor(crabs$sex, levels = c("M", "F"))
fit <- lm(CL ~ sex * sp, data = crabs)
summary(fit)
#>
#> Call:
#> lm(formula = CL ~ sex * sp, data = crabs)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -16.988 -4.636 0.184 5.130 15.086
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 32.0140 0.9499 33.702 < 2e-16 ***
#> sexF -3.9120 1.3434 -2.912 0.00401 **
#> spO 1.6740 1.3434 1.246 0.21420
#> sexF:spO 4.8420 1.8998 2.549 0.01158 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 6.717 on 196 degrees of freedom
#> Multiple R-squared: 0.1232, Adjusted R-squared: 0.1098
#> F-statistic: 9.181 on 3 and 196 DF, p-value: 1.031e-05
This flips the sign of the interaction term.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.