简体   繁体   中英

What is the best way to load a 20 GB csv file into R?

I have a data set of 20 GB and I have to work with it in R. Now, I have read several articles how to handle this but I have no idea what the best and most efficient way is to read 20 GB of data in R.

Important to mention is that I do not need all the data, so I have to filter/clean the data before I am going to proceed with building my model.

Is it an idea to read the data set into R with Chunks? And what is the best way to read data into Chunks into R?

I hope that someone can help me out.

Kind regards,

Matthijs

You could load the data in different parts. Just like you suggest in your comment you could select 10 000 rows and then another 10 000 and so on.

Since you are working with .csv files, I suggest you use the read.csv() function.

Example :

data <- read.csv("file = C:\\Path\\To\\YourFile.csv", nrows = 10000, skip = 10000)

nrows = the number of rows you want R to read.

skip = the number of rows you want R to skip.

The fread function in the data.table package is probably your best bet for an off the shelf function in terms of speed and efficiency. Like was mentioned previously, you can still include the nrows and skip arguments to read the data in pieces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM