R - Reading in specific rows from a file

Question

Let's say I have multiple files, each with 15,000 rows and 40,000 columns. I have determined in advance that I only need the last 5 rows from each file. (eg. I need rows 14996, 14997, 14998, 14999 and 15000).

In R, I have been looping over each file with read.table() with the "skip" and "nrows" arguments to extract the rows I need from each file without reading the entire file into R. Unfortunately, It takes a long time, with the skip argument, for R to reach the last five rows of a 15,000 x 40,000 table. Is there a easy, quicker way to extract the rows I need? Should I try out mySQL?

Answer 1

This will likely be much faster than read.table()

lapply(files, data.table::fread, skip = 14995L, nrow = 5L)

where files is your list of file names.

Update: According to your comments, I think you will want to try gzfile() in read.table() . You didn't mention whether you used it in your previous attempt.

dflist <- lapply(files, function(x) {
    df <- read.table(zz <- gzfile(x), skip = 14995L, nrow = 5L)
    close(zz)
    df
})

R - Reading in specific rows from a file

Question

1 answers

solution1
1 ACCPTED 2015-08-21 03:10:54

R - Reading in specific rows from a file

Question

1 answers

solution1 1 ACCPTED 2015-08-21 03:10:54

solution1
1 ACCPTED 2015-08-21 03:10:54