简体   繁体   English

read.csv2.ffdf正在导入一个数字(浮点)变量作为因子

[英]read.csv2.ffdf is importing a numeric (float) variable as factor

I have a couple weeks working with the ff package and it has been working great so far, but today I realized that a variable that should be numeric is being readed as a factor. 我有几个星期使用ff包,它到目前为止一直很好用,但今天我意识到应该是数字的变量被作为一个因素。 The data has about 900k rows and 800 col, so it's not easy to control that every column gets the class that it should... 数据有大约900k行和800 col,因此控制每列都应该得到它应该的类是不容易的...

matff <- read.csv2.ffdf(file = name,encoding = "UTF-8",next.rows=150000,colClasses=NA)

I would like to know why may this be happening and an idea on how to fix it. 我想知道为什么会发生这种情况以及如何解决这个问题。

Thanks. 谢谢。

Your data has some columns which are clearly texts and not numeric data as you expect it. 您的数据有一些列是明确的文本,而不是您期望的数字数据。

You can use the transFUN argument to read.csv2.ffdf to solve your decimal problem. 您可以使用trans.cUN参数read.csv2.ffdf来解决小数问题。 As in 如在

transFUN=function(x){
  x$mycolumn <- as.numeric(gsub(",", ".", as.character(x$mycolumn)))
  x
}

Or use the appropriate read.table arguments. 或者使用适当的read.table参数。

Now it should work: 现在它应该工作:

# matff <- data.frame(Col=c('a','b','c'),Mix1=c('a','1.2','c'),Mix2=c(1.1,2.1,3),Num1=c('1.2','2.3','3.4'),Num2=c('1,2','2,3','3,4')) # Data example

func <- function(x) {
 if (class(x) != 'numeric') {
  x <- levels(x)[x]
  if (length(grep('[a-zA-Z]',x,invert=T)) == length(x)) { x <- as.real(gsub(',','\\.',x)) }
  else { x <- factor(x) }
 }
 x
}

for (i in 1:ncol(matff)) {
 matff[,i] <- func(matff[,i])
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM