读取R中的大型CSV文件，并使用行数将其导出为多个RData文件，并跳过

Question

I'm attempting to import and export, in pieces, a single 10GB CSV file with roughly 10 million observations. 我正在尝试分批导入和导出大约10百万个观测值的单个10GB CSV文件。 I want about 10 manageable RData files in the end ( data_1.RData , data_2.Rdata , etc.), but I'm having trouble making the skip and nrows dynamic. 我想大约10可控RDATA文件到底（ data_1.RData ， data_2.Rdata等），但我在做的麻烦skip和nrows动态。 My nrows will never change as I need almost 1 million per dataset, but I'm thinking I'll need some equation for skip= so that every loop it increases to catch the next 1 million rows. 我nrows永远不会改变，因为我需要近1万元左右的数据集，但我想我需要一些方程skip=让每一个它增加循环赶上下一个一个百万行。 Also, having header=T might mess up anything over ii=1 since only the first row will include variable names. 另外，因为只有第一行将包含变量名，所以header=T可能会使ii=1以上的内容混乱。 The following is the bulk of the code I'm working with: 以下是我正在使用的大部分代码：

for (ii in 1:10){
      data <- read.csv("myfolder/file.csv", 
                         row.names=NULL, header=T, sep=",", stringsAsFactors=F,
                         skip=0, nrows=1000000)
      outName <- paste("data",ii,sep="_")
      save(data,file=file.path(outPath,paste(outName,".RData",sep="")))

    }

Answer 1

(Untested but...) You can try something like this: （未经测试，但...）您可以尝试执行以下操作：

nrows <- 1000000
ind <- c(0, seq(from = nrows, length.out = 10, by = nrows) + 1)
header <- names(read.csv("myfolder/file.csv", header = TRUE, nrows = 1))

for (i in seq_along(ind)) {
  data <- read.csv("myfolder/file.csv", 
                   row.names = NULL, header = FALSE, 
                   sep = ",", stringsAsFactors = FALSE,
                   skip = ind[i], nrows = 1000000)
  names(data) <- header
  outName <- paste("data", ii, sep = "_")
  save(data, file = file.path(outPath, paste(outName, ".RData", sep = "")))
}

读取R中的大型CSV文件，并使用行数将其导出为多个RData文件，并跳过

问题描述

1 个解决方案

解决方案1
1 2014-12-15 18:19:17

读取R中的大型CSV文件，并使用行数将其导出为多个RData文件，并跳过

问题描述

1 个解决方案

解决方案1 1 2014-12-15 18:19:17

解决方案1
1 2014-12-15 18:19:17