简体   繁体   English

如何从R Revolution Enterprise中分离的大文件中获取所有数据?

[英]How can I get all the data from separated large files in R Revolution Enterprise?

I'm using RevoR entreprise to handle impoting large data files. 我正在使用RevoR企业来处理大量数据文件。 The example given in the documentation states that 10 files (1000000 rows each) will be imported as dataset using an rxImport loop like this : 文档中给出的示例指出,将使用rxImport循环将10个文件(每个1000000行)作为数据集导入:

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory,"mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
for(i in 2000:2009){
importFile <- paste(Data.File,i,".csv",sep="")
mortxdf <- rxImport(importFile, mortXdfFileName, append = append, overwrite = TRUE, maxRowsByCols = NULL)
append <- "rows"    
}
mortxdfData <- RxXdfData(mortXdfFileName)
knime.out <- rxXdfToDataFrame(mortxdfData)

The issue here is that I only get 500000 rows in the dataset due to the maxRowsByCols argument the default is 1e+06 i changed it to a higher value and then to NULL but it still truncates the data from the file. 这里的问题是,由于maxRowsByCols参数,我在数据集中仅获得500000行,默认值为1e+06我将其更改为更高的值,然后更改为NULL但仍会截断文件中的数据。

Since you are importing to an XDF the maxRowsByCols doesn't matter. 由于要导入到XDF因此maxRowsByCols无关紧要。 Also, on the last line you read into a data.frame , this sort of defeats the purpose of using an XDF in the first place. 同样,在最后一行中,您读入了data.frame ,但这种方式data.frame了首先使用XDF的目的。

This code does work for me on this data http://packages.revolutionanalytics.com/datasets/mortDefault.zip , which is what I assume you are using. 这段代码确实对我有用 ,因为我认为您正在使用此数据http://packages.revolutionanalytics.com/datasets/mortDefault.zip

The 500K rows is due to the rowsPerRead argument, but that just determines block size. 500K行归因于rowsPerRead参数,但这仅决定了块大小。 All of the data is read in, just in 500k increments, but can be changed to match your needs. 所有数据都以500k的增量读入,但可以根据需要进行更改。

setwd("C:/Users/Fsociety/Bigdatasamples")
Data.Directory <- "C:/Users/Fsociety/Bigdatasamples"
Data.File <- file.path(Data.Directory, "mortDefault")
mortXdfFileName <- "mortDefault.xdf"

append <- "none"
overwrite <- TRUE
for(i in 2000:2009){
  importFile <- paste(Data.File, i, ".csv", sep="")
  rxImport(importFile, mortXdfFileName, append=append, overwrite = TRUE)
  append <- "rows"
  overwrite <- FALSE
}

rxGetInfo(mortxdfData, getBlockSizes = TRUE)

# File name: C:\Users\dnorton\OneDrive\R\MarchMadness2016\mortDefault.xdf 
# Number of observations: 1e+07 
# Number of variables: 6 
# Number of blocks: 20 
# Rows per block (first 10): 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05 5e+05
# Compression type: zlib 

已修复,问题是RxXdfData()具有maxrowbycols限制,将其更改为NULL会将整个rxXdfData转换为Knime的data.frame对象。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Revolution R Enterprise 或 R Studio 中导入大型数据集? - How to import Large Datasets in Revolution R Enterprise or R Studio? 通过RevoEnt-10817.enterprise.tar.gz文件构建Revolution R Enterprise? - Building Revolution R Enterprise from the RevoEnt-10817.enterprise.tar.gz file? 当向量很大时,如何获取R中向量的所有可能分区的列表? - How can I get a list of all possible partition of a vector in R when the vector is large? 如何使用任何R包(如ff或data.table)剪切大型csv文件? - How can I cut large csv files using any R packages like ff or data.table? 如何从 R 中的多个文件中获取列的平均值? - How can I get the average for a column from multiple files in R? 如何根据条件从R中的大型数据集中删除一组特定数据? - How can I remove a set of specific data, based on a condition, from a large dataset in R? 如何在R中使来自大型数据集的条形图更加清晰简洁? - How can I make my barchart from a large data set more clear and concise in R? 如何在 R 中将具有两个制表符分隔列的 multiple.txt 文件读取到一个 dataframe 中? - How can I read multiple .txt files with two tab separated columns into one dataframe in R? 带有R版本3.2.2和插入符号包的Revolution R Enterprise - Revolution R Enterprise with R version 3.2.2 and caret package 如何从 R 中的大栅格中提取像素值? - How can I extract pixels values from a large raster in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM