R-从文件中读取特定行

Question

Let's say I have multiple files, each with 15,000 rows and 40,000 columns. 假设我有多个文件，每个文件有15,000行和40,000列。 I have determined in advance that I only need the last 5 rows from each file. 我已经预先确定，我只需要每个文件的最后5行。 (eg. I need rows 14996, 14997, 14998, 14999 and 15000). （例如，我需要14996、14997、14998、14999和15000行）。

In R, I have been looping over each file with read.table() with the "skip" and "nrows" arguments to extract the rows I need from each file without reading the entire file into R. Unfortunately, It takes a long time, with the skip argument, for R to reach the last five rows of a 15,000 x 40,000 table. 在R中，我一直在使用read.table（）遍历每个文件，并带有“ skip”和“ nrows”参数，以从每个文件中提取所需的行，而无需将整个文件读入R。不幸的是，这需要很长时间，使用skip参数，使R到达15,000 x 40,000表的最后五行。 Is there a easy, quicker way to extract the rows I need? 有没有一种简单，快捷的方法来提取我需要的行？ Should I try out mySQL? 我应该尝试使用mySQL吗？

Answer 1

This will likely be much faster than read.table() 这可能比read.table()快得多

lapply(files, data.table::fread, skip = 14995L, nrow = 5L)

where files is your list of file names. 其中files是您的文件名列表。

Update: According to your comments, I think you will want to try gzfile() in read.table() . 更新：根据您的评论，我认为您将要在read.table()尝试gzfile() read.table() 。 You didn't mention whether you used it in your previous attempt. 您没有提到在上一次尝试中是否使用过它。

dflist <- lapply(files, function(x) {
    df <- read.table(zz <- gzfile(x), skip = 14995L, nrow = 5L)
    close(zz)
    df
})

R-从文件中读取特定行

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-08-21 03:10:54

R-从文件中读取特定行

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-08-21 03:10:54

解决方案1
1 已采纳 2015-08-21 03:10:54