简体   繁体   中英

Large csv file fails to fully read in to R data.frame

I am trying to load a fairly large csv file into R. It has about 50 columns and 2million row.

My code is pretty basic, and I have used it to open files before but none this large.

mydata <- read.csv('file.csv', header = FALSE, sep=",", stringsAsFactors = FALSE)

The result is that it reads in the data but stops after 1080000 rows or so. This is roughly where excel stops as well. Is their way to get R to read the whole file in? Why is it stopping around half way.

Update: (11/30/14) After speaking with the provider of the data it was discovered that they may have been some corruption issue with the file. A new file was provided which also is smaller and loads into R easily.

As, "read.csv()" read up to 1080000 rows, "fread" from library(data.table) should read it with ease. If not, there exists two other options, either try with library(h20) or with "fread" you can use select option to read required columns (or read in two halves, do some cleaning and can merge them back).

You can try using read.table and include the parameter colClasses to specify the type of the individual columns.

With your current code, R will read all data first as strings and then check for each column if it is convertible eg to a numeric type, which needs more memory than reading right away as numeric. colClasses will also allow you to ignore columns you might not need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM