简体   繁体   English

优化帮助(R无法分配向量)

[英]Optimization help (R cannot allocate vector)

I have a 16 GB ram running w10 64 Bit on a 64 bit version of R . 我有一个16 GB的ram在R的64位版本上运行w10 64位。 Im trying to merge a bunch of CSVs on this link ( http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml ) specifically the yellow bit Edit: only for one year atm, but would want to import more data once this works 我正在尝试在此链接( http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml )上合并一堆CSV,特别是黄色的位编辑:仅适用于一年的atm,但希望可行后,导入更多数据

heres the code im running 这是我正在运行的代码

library(readr)
FList <- list.files(pattern = "*.csv")
for (i in 1:length(FList))
  {
  print(i)
  assign(FList[i], read_csv(FList[i]))
  if (i==2) {
    DF<-rbind(get(FList[1]),get(FList[2]))
    rm(list = c(FList[1],FList[2]))
  }
  if (i>2)
    {
    DF<-rbind(DF,get(FList[i]))
    rm(list = FList[i])
  }
  gc()
}

I get the error on the 6th iteration, task manager shows the memory usage in the 90% during the rbind operation but drops to 60 after its done 我在第6次迭代中收到错误,任务管理器在rbind操作期间显示了90%的内存使用率,但在完成后显示为60%

Running gc() after the error gives the following 错误发生后执行gc()会显示以下内容

> gc()
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells    3821676   204.1   10314672   550.9   13394998   715.4
Vcells 1363034028 10399.2 3007585511 22946.1 2058636792 15706.2
> 

I do not have a lot of experience with this, any help in optimizing the code would be appreciated. 我没有太多的经验,在优化代码方面的任何帮助将不胜感激。 ps would running it with read.csv help? PS会与read.csv帮助运行它? I'm assuming the date time format in the few columns might be resource hungry. 我假设几列中的日期时间格式可能会占用大量资源。 Havent tried it yet because I need the columns in datetime format. Havent尝试了它,因为我需要datetime格式的列。

You can try it with lapply instead of a loop 您可以尝试使用lapply而不是循环

files <- list.files(pattern = glob2rx("*.csv"))

df <- lapply(files, function(x) read.csv(x))
df <- do.call(rbind, df)

Another way is to append them in the command line instead of R. This should be less memory intensive. 另一种方法是将它们附加在命令行中而不是R。这应该减少内存占用。 Just google appends csv and your OS appropriate command line tool. 只是google会附加csv和您的操作系统相应的命令行工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM