简体   繁体   中英

R change categorical data to dummy variables

I have a multi-variant data frame and want to convert the categorical data inside to dummy variables, I used model.matrix but it does not quite work. Please refer to the example below:

age = c(1:15)                                                          #numeric
sex = c(rep(0,7),rep(1,8)); sex = as.factor(sex)                       #factor
bloodtype = c(rep('A',2),rep('B',8),rep('O',1),rep('AB',4));bloodtype = as.factor(bloodtype)         #factor
bodyweight = c(11:25)                                                  #numeric

wholedata = data.frame(cbind(age,sex,bloodtype,bodyweight))

model.matrix(~.,data=wholedata)[,-1]

The reason I did not use model.matrix(~age+sex+bloodtype+bodyweight)[,-1] is because this is just a toy example. In the real data, I could have tens or hundreds more columns. I do not think type all variable names here is a good idea.

Thanks

It's the cbind that's messing things up. It converts your factors to numerics which are then not interpreted correctly by model.matrix .

If you just do wholedata = data.frame(age,sex,bloodtype,bodyweight) there should be no problem.

cbind returns a matrix and in a matrix everything must have the same type. The result in this example is that the factors are converted to integers (which is the underlying representation of a factor in the first place) and then the type of the matrix is integer.

Try

wholedata = cbind(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## TRUE
is.factor(wholedata[,2]) ## FALSE

wholedata = data.frame(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## FALSE
is.factor(wholedata[,2]) ## TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM