简体   繁体   English

将多个data.table列转换为R中的因子

[英]Converting multiple data.table columns to factors in R

I ran into an unexpected problem when trying to convert multiple columns of a data table into factor columns. 尝试将数据表的多个列转换为因子列时,我遇到了意外问题。 I've reproduced it as follows: 我把它复制如下:

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
tst[,as.factor(a)]  #Returns expected result
tst[,as.factor('a'),with=FALSE] #Returns error

The latter command returns 'Error in Math.factor(j) : abs not meaningful for factors'. 后一个命令返回'Math.factor(j)中的错误:abs对因子没有意义'。 I found this when attempting to get tst[,lapply(cols, as.factor),with=FALSE] where cols was a collection of rows I was attempting to convert to factors. 我试图获得tst [,lapply(cols,as.factor),= = FALSE]时发现了这一点,其中cols是我试图转换为因子的行的集合。 Is there any solution or workaround for this? 这有什么解决方案或解决方法吗?

I found one solution: 我找到了一个解决方案

library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
cols <- 'a'
tst[,(cols):=lapply(.SD, as.factor),.SDcols=cols]

Still, the earlier-mentioned behavior seems buggy. 尽管如此,前面提到的行为似乎还有些问题。

This is now fixed in v1.8.11 , but probably not in the way you'd hoped for . 现在已在v1.8.11修复 ,但可能不是你希望的方式 From NEWS : 来自新闻

FR #4867 is now implemented. FR#4867现已实施。 DT[, as.factor('x'), with=FALSE] where x is a column in DT , is now equivalent to DT[, "x", with=FALSE] instead of ending up with an error. DT[, as.factor('x'), with=FALSE] ,其中xDT一列,现在等效于DT[, "x", with=FALSE]而不是以错误结束。 Thanks to tresbot for reporting on SO: Converting multiple data.table columns to factors in R 感谢tresbot报告SO: 将多个data.table列转换为R中的因子


Some explanation: The difference, when with=FALSE is used, is that the columns of the data.table aren't seen as variables anymore. 一些解释:with=FALSEwith=FALSE差异在于data.table的列不再被视为变量。 That is: 那是:

tst[, as.factor(a), with=FALSE] # would give "a" not found!

would result in an error "a" not found . 会导致错误"a" not found But what you do instead is: 但你做的是:

tst[, as.factor('a'), with=FALSE]

You're in fact creating a factor "a" with level="a" and asking to subset that column . 实际上,您正在使用level="a"创建一个因子"a" ,并要求对该列进行子集化 This doesn't really make much sense. 这没有多大意义。 Take the case of data.frame s: data.frame

DF <- data.frame(x=1:5, y=6:10)
DF[, c("x", "y")] # gives back DF

DF[, factor(c("x", "y"))] # gives back DF again, not factor columns
DF[, factor(c("x", "x"))] # gives back two columns of "x", still integer, not factor!

So, basically, what you're applying a factor on, when you use with=FALSE is not on the elements of that column, but just that column name... I hope I've managed to convey the difference well. 所以,基本上,当你使用with=FALSE时,你正在应用一个因子, 不在该列的元素上,而只是那个列名...我希望我能够很好地传达差异。 Feel free to edit/comment if there are any confusions. 如果有任何混淆,请随时编辑/评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM