简体   繁体   中英

Converting multiple data.table columns to factors in R

I ran into an unexpected problem when trying to convert multiple columns of a data table into factor columns. I've reproduced it as follows:

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
tst[,as.factor(a)]  #Returns expected result
tst[,as.factor('a'),with=FALSE] #Returns error

The latter command returns 'Error in Math.factor(j) : abs not meaningful for factors'. I found this when attempting to get tst[,lapply(cols, as.factor),with=FALSE] where cols was a collection of rows I was attempting to convert to factors. Is there any solution or workaround for this?

I found one solution:

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
cols <- 'a'
tst[,(cols):=lapply(.SD, as.factor),.SDcols=cols]

Still, the earlier-mentioned behavior seems buggy.

This is now fixed in v1.8.11 , but probably not in the way you'd hoped for . From NEWS :

FR #4867 is now implemented. DT[, as.factor('x'), with=FALSE] where x is a column in DT , is now equivalent to DT[, "x", with=FALSE] instead of ending up with an error. Thanks to tresbot for reporting on SO: Converting multiple data.table columns to factors in R


Some explanation: The difference, when with=FALSE is used, is that the columns of the data.table aren't seen as variables anymore. That is:

tst[, as.factor(a), with=FALSE] # would give "a" not found!

would result in an error "a" not found . But what you do instead is:

tst[, as.factor('a'), with=FALSE]

You're in fact creating a factor "a" with level="a" and asking to subset that column . This doesn't really make much sense. Take the case of data.frame s:

DF <- data.frame(x=1:5, y=6:10)
DF[, c("x", "y")] # gives back DF

DF[, factor(c("x", "y"))] # gives back DF again, not factor columns
DF[, factor(c("x", "x"))] # gives back two columns of "x", still integer, not factor!

So, basically, what you're applying a factor on, when you use with=FALSE is not on the elements of that column, but just that column name... I hope I've managed to convey the difference well. Feel free to edit/comment if there are any confusions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM