简体   繁体   English

大型CSV文件无法完全读入R data.frame

[英]Large csv file fails to fully read in to R data.frame

I am trying to load a fairly large csv file into R. It has about 50 columns and 2million row. 我正在尝试将相当大的csv文件加载到R中。它具有约50列和200万行。

My code is pretty basic, and I have used it to open files before but none this large. 我的代码很基本,我以前曾用它打开过文件,但没有那么大。

mydata <- read.csv('file.csv', header = FALSE, sep=",", stringsAsFactors = FALSE)

The result is that it reads in the data but stops after 1080000 rows or so. 结果是它读入数据,但在1080000行左右后停止。 This is roughly where excel stops as well. 这也是excel停止的地方。 Is their way to get R to read the whole file in? 他们是让R读取整个文件的方法吗? Why is it stopping around half way. 为什么它停止约一半。

Update: (11/30/14) After speaking with the provider of the data it was discovered that they may have been some corruption issue with the file. 更新:(11/30/14)与数据提供者交谈后,发现它们可能是文件损坏的问题。 A new file was provided which also is smaller and loads into R easily. 提供了一个新文件,该文件也较小,可以轻松加载到R中。

As, "read.csv()" read up to 1080000 rows, "fread" from library(data.table) should read it with ease. 由于“ read.csv()”最多读取1080000行,因此从library(data.table)读取“ fread”应该很容易。 If not, there exists two other options, either try with library(h20) or with "fread" you can use select option to read required columns (or read in two halves, do some cleaning and can merge them back). 如果没有,则存在另外两个选项,或者尝试使用library(h20)或使用“ fread”,您可以使用select选项来读取所需的列(或读入两半,进行一些清理,然后可以将它们合并回去)。

You can try using read.table and include the parameter colClasses to specify the type of the individual columns. 您可以尝试使用read.table并包含参数colClasses来指定各个列的类型。

With your current code, R will read all data first as strings and then check for each column if it is convertible eg to a numeric type, which needs more memory than reading right away as numeric. 使用您当前的代码,R将首先以字符串的形式读取所有数据,然后检查每一列是否可转换(例如,转换为数字类型),这比立即读取作为数字需要更多的内存。 colClasses will also allow you to ignore columns you might not need. colClasses还可以让您忽略可能不需要的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM