如何从R Revolution Enterprise中分离的大文件中获取所有数据？

Question

我正在使用RevoR企业来处理大量数据文件。 文档中给出的示例指出，将使用rxImport循环将10个文件（每个1000000行）作为数据集导入：

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory,"mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
for(i in 2000:2009){
importFile <- paste(Data.File,i,".csv",sep="")
mortxdf <- rxImport(importFile, mortXdfFileName, append = append, overwrite = TRUE, maxRowsByCols = NULL)
append <- "rows"    
}
mortxdfData <- RxXdfData(mortXdfFileName)
knime.out <- rxXdfToDataFrame(mortxdfData)

这里的问题是，由于maxRowsByCols参数，我在数据集中仅获得500000行，默认值为1e+06我将其更改为更高的值，然后更改为NULL但仍会截断文件中的数据。

Answer 1

由于要导入到XDF因此maxRowsByCols无关紧要。 同样，在最后一行中，您读入了data.frame ，但这种方式data.frame了首先使用XDF的目的。

这段代码确实对我有用，因为我认为您正在使用此数据http://packages.revolutionanalytics.com/datasets/mortDefault.zip 。

500K行归因于rowsPerRead参数，但这仅决定了块大小。 所有数据都以500k的增量读入，但可以根据需要进行更改。

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory, "mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
overwrite <- TRUE
for(i in 2000:2009){
  importFile <- paste(Data.File, i, ".csv", sep="")
  rxImport(importFile, mortXdfFileName, append=append, overwrite = TRUE)
  append <- "rows"
  overwrite <- FALSE
}

rxGetInfo(mortxdfData, getBlockSizes = TRUE)

# File name: C:\Users\dnorton\OneDrive\R\MarchMadness2016\mortDefault.xdf 
# Number of observations: 1e+07 
# Number of variables: 6 
# Number of blocks: 20 
# Rows per block (first 10): 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05
# Compression type: zlib

Answer 2

已修复，问题是RxXdfData（）具有maxrowbycols限制，将其更改为NULL会将整个rxXdfData转换为Knime的data.frame对象。

如何从R Revolution Enterprise中分离的大文件中获取所有数据？

问题描述

2 个解决方案

解决方案1
1 2016-03-17 19:56:56

解决方案2
1 已采纳 2016-03-18 15:45:07

如何从R Revolution Enterprise中分离的大文件中获取所有数据？

问题描述

2 个解决方案

解决方案1 1 2016-03-17 19:56:56

解决方案2 1 已采纳 2016-03-18 15:45:07

解决方案1
1 2016-03-17 19:56:56

解决方案2
1 已采纳 2016-03-18 15:45:07