简体   繁体   English

从R中的csv文件读取选择性数据

[英]read selective data from csv file in R

I have 1,584,823 total records, with 157 meters(dataID). 我总共有1,584,823条记录,其中有157米(数据ID)。 My dataset has three columns, as below. 我的数据集有三列,如下所示。

 localminute,dataid,meter_value
    2015-10-01 05:00:10,739,88858
    2015-10-01 05:00:13,8890,197164
    2015-10-01 05:00:20,6910,179118
    2015-10-01 05:00:22,3635,151318
    2015-10-01 05:00:22,1507,390354
    2015-10-01 05:00:29,5810,97506
    2015-10-01 05:01:00,484,99298
    2015-10-01 05:01:18,6910,179118

How should i read and filter the meter_value of specific dataid in R? 我应该如何读取和过滤R中特定dataid的meter_value? let's say, I want to read and export the data for dataID=739, how should I apply read.csv and write.csv to filter all meter_value of dataID=739, as doing filtering in excel. 假设我要读取和导出dataID = 739的数据,像在excel中进行过滤一样,应该如何应用read.csv和write.csv来过滤dataID = 739的所有meter_value。 Due to large data, I could not do filter in Excel. 由于数据量大,我无法在Excel中进行过滤。

You should be able to just read the entire file into R and then filter within R: 您应该能够将整个文件读入R,然后在R中进行过滤:

df <- read.csv(file="path/to/file.txt")
df_sub <- df[df$dataid == 739, ] # or subset(df, dataid == 739)
write.csv(df_sub, file="path/to/file_out.txt")

There is nothing wrong with reading the entire file into memory provided that it can reasonably fit. 将整个文件读入内存没有任何问题,只要它可以合理容纳即可。 1.5 million rows with only a handful of columns should not be more than a few MB of RAM. 仅几列的150万行不应超过几MB RAM。

The issue here is that R is very powerful for manipulating data loaded into it, but read.csv is much less useful for complex filtering. 这里的问题是R对于处理加载到其中的数据非常强大,但是read.csv对于复杂的过滤却read.csv多大用处。

The built-in R function 'subset' is used for this: 内置的R函数“子集”用于此目的:

# replace with actual path to actual filename
data <- read.csv('data.csv')

# subset the values
sub_data <- subset(data, dataid == 739)

# write out the data
write.csv(sub_data, 'subset_filename.csv', row.names = F)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM