简体   繁体   English

函数ff:read.csv.ffdf中的colClasses参数的当前状态(ff-R程序包)

[英]Current status of colClasses argument in function ff:read.csv.ffdf (ff - R package)

Error vmode 'character' not implemented occours due to argument colClasses=c("id"="character") in below code : 由于以下代码中的参数colClasses=c("id"="character") ,错误colClasses=c("id"="character") vmode 'character' not implemented

df <- read.csv.ffdf('TenGBsample.csv',
      colClasses=c("id"="character"), VERBOSE=TRUE)

read.table.ffdf 1..1000 (1000) csv-read=0.02secError in ff(initdata = initdata, length = length, levels = levels, ordered = ordered, : read.table.ffdf 1..1000(1000)csv-read = 0.02sec ff中的错误(initdata = initdata,length = length,levels = level,ordered = ordered,:
vmode 'character' not implemented vmode“字符”未实现

where first column in TenGBsample.csv is 'id' and consist of 30 digit numbers, which exceeds maximum number on my 64-bit system (Windows), I would like to handle them as character, second column contain small numbers, so there is no need for adjustment. 其中TenGBsample.csv中的第一TenGBsample.csv 'id',由30位数字组成,超过了我的64位系统(Windows)上的最大数字,我想将它们作为字符进行处理,第二列包含小数字,因此无需调整。

I've checked, and there is 'character' mode for vmode : http://127.0.0.1:16624/library/ff/html/vmode.html 我已经检查过,并且有vmode “字符”模式: http : //127.0.0.1 : vmode / vmode.html

Note the following from help(read.csv.ffdf) 请注意help(read.csv.ffdf)的以下内容

... read.table.ffdf has been designed to behave as much like read.table as possible. ... read.table.ffdf行为设计得尽可能类似于read.table However, note the following differences: 但是,请注意以下差异:

  1. character vectors are not supported , character data must be read as one of the following colClasses: 'Date', 'POSIXct', 'factor, 'ordered'. 不支持字符向量 ,必须将字符数据读取为以下colClasses之一:'Date','POSIXct','factor,'ordered'。 By default character columns are read as factors. 默认情况下,字符列被视为因素。 Accordingly arguments 'as.is' and 'stringsAsFactors' are not allowed. 因此,不允许使用参数“ as.is”和“ stringsAsFactors”。

So you cannot read the value in as character. 因此,您无法读取字符中的值。 But if you already have numeric values for the id column in the file, then you could read them in as doubles and re-format them afterward. 但是,如果文件中的id列已具有数字值,则可以将它们读取为双精度,然后重新格式化。 format(x, scientific = FALSE) will print x in standard notation. format(x, scientific = FALSE)将以标准符号打印x

Here's an example data set x where id is numeric and has 30 digits. 这是一个数据集x的示例,其中id是数字,具有30位数字。

library(ff)

x <- data.frame(
    id = (267^12 + (102:106)^12),  
    other = paste0(LETTERS[1:5],letters[1:5])
)
## create a csv file with 'x'
csvfile <- tempPathFile(path = getOption("fftempdir"), extension = "csv")
write.csv(
    format(x, scientific = FALSE), 
    file = csvfile, row.names = FALSE, quote = 2
)    
## read in the data without colClasses
ffx <- read.csv.ffdf(file = csvfile)
vmode(ffx)
#       id     other 
# "double" "integer" 

Now we can coerce ffx to class data.frame with ffx[,] and re-format the id column. 现在,我们可以强制ffx使用ffx[,]data.frame进行ffx[,]并重新设置id列的格式。

df <- within(ffx[,], id <- format(id, scientific = FALSE))
class(df$id)
# [1] "character"
df
#                               id other
# 1 131262095302921040298042720256    Aa
# 2 131262252822013319483345600512    Bb
# 3 131262428093345052649582493696    Cc
# 4 131262622917452503293152460800    Dd
# 5 131262839257598318815163187200    Ee

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM