How can I get all the data from separated large files in R Revolution Enterprise?

Question

I'm using RevoR entreprise to handle impoting large data files. The example given in the documentation states that 10 files (1000000 rows each) will be imported as dataset using an rxImport loop like this :

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory,"mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
for(i in 2000:2009){
importFile <- paste(Data.File,i,".csv",sep="")
mortxdf <- rxImport(importFile, mortXdfFileName, append = append, overwrite = TRUE, maxRowsByCols = NULL)
append <- "rows"    
}
mortxdfData <- RxXdfData(mortXdfFileName)
knime.out <- rxXdfToDataFrame(mortxdfData)

The issue here is that I only get 500000 rows in the dataset due to the maxRowsByCols argument the default is 1e+06 i changed it to a higher value and then to NULL but it still truncates the data from the file.

Answer 1

Since you are importing to an XDF the maxRowsByCols doesn't matter. Also, on the last line you read into a data.frame , this sort of defeats the purpose of using an XDF in the first place.

This code does work for me on this data http://packages.revolutionanalytics.com/datasets/mortDefault.zip , which is what I assume you are using.

The 500K rows is due to the rowsPerRead argument, but that just determines block size. All of the data is read in, just in 500k increments, but can be changed to match your needs.

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory, "mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
overwrite <- TRUE
for(i in 2000:2009){
  importFile <- paste(Data.File, i, ".csv", sep="")
  rxImport(importFile, mortXdfFileName, append=append, overwrite = TRUE)
  append <- "rows"
  overwrite <- FALSE
}

rxGetInfo(mortxdfData, getBlockSizes = TRUE)

# File name: C:\Users\dnorton\OneDrive\R\MarchMadness2016\mortDefault.xdf 
# Number of observations: 1e+07 
# Number of variables: 6 
# Number of blocks: 20 
# Rows per block (first 10): 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05
# Compression type: zlib

Answer 2

已修复，问题是RxXdfData（）具有maxrowbycols限制，将其更改为NULL会将整个rxXdfData转换为Knime的data.frame对象。

How can I get all the data from separated large files in R Revolution Enterprise?

Question

2 answers

solution1
1 2016-03-17 19:56:56

solution2
1 ACCPTED 2016-03-18 15:45:07

How can I get all the data from separated large files in R Revolution Enterprise?

Question

2 answers

solution1 1 2016-03-17 19:56:56

solution2 1 ACCPTED 2016-03-18 15:45:07

solution1
1 2016-03-17 19:56:56

solution2
1 ACCPTED 2016-03-18 15:45:07