R numeric and categorical variables in multiple linear regression

Question

I have a data frame that looks similar to this:

BMI<-c(13.4,14,15.6,16,13.4,12.9,17.7,18.3,17,16.5)
sport<-c(1,2,2,3,2,1,1,3,1,2)
social<-c("low","middle","middle","low","high","low","middle","middle","high","middle")
smoker<-c(1,0,0,1,2,3,2,2,2,1)

status<-c("low","high","low","middle","low","middle","middle","middle","high","low")
social<-as.factor(social)
status<-as.factor(status)
sport<-as.integer(sport)
smoker<-as.integer(smoker)

df<-data.frame(BMI,sport,social,status,smoker)

I want to perform a multiple linear regression on the variable "BMI" but I don´t know how to deal with the categorical variables or let´s say with the different formats in general.

How would I need to transform these variables to be able to get a meaningful result?

Answer 1

You need to use a generalized linear model and set categorical variables using factor like:

glm(data=iris,formula=Sepal.Width~Sepal.Length+Petal.Length+factor(Species))

Using your data:

glm(data=df,BMI~sport+social+status+smoker,family="gaussian")

If you want a linear model:

library(tidyverse)
df1<-df %>% 
  mutate_if(is.character,as.factor)
lm(BMI~sport+social+status+smoker,data=df1)

R numeric and categorical variables in multiple linear regression

Question

1 answers

solution1
1 ACCPTED 2019-01-08 18:45:42

R numeric and categorical variables in multiple linear regression

Question

1 answers

solution1 1 ACCPTED 2019-01-08 18:45:42

solution1
1 ACCPTED 2019-01-08 18:45:42