简体   繁体   中英

read selective data from csv file in R

I have 1,584,823 total records, with 157 meters(dataID). My dataset has three columns, as below.

 localminute,dataid,meter_value
    2015-10-01 05:00:10,739,88858
    2015-10-01 05:00:13,8890,197164
    2015-10-01 05:00:20,6910,179118
    2015-10-01 05:00:22,3635,151318
    2015-10-01 05:00:22,1507,390354
    2015-10-01 05:00:29,5810,97506
    2015-10-01 05:01:00,484,99298
    2015-10-01 05:01:18,6910,179118

How should i read and filter the meter_value of specific dataid in R? let's say, I want to read and export the data for dataID=739, how should I apply read.csv and write.csv to filter all meter_value of dataID=739, as doing filtering in excel. Due to large data, I could not do filter in Excel.

You should be able to just read the entire file into R and then filter within R:

df <- read.csv(file="path/to/file.txt")
df_sub <- df[df$dataid == 739, ] # or subset(df, dataid == 739)
write.csv(df_sub, file="path/to/file_out.txt")

There is nothing wrong with reading the entire file into memory provided that it can reasonably fit. 1.5 million rows with only a handful of columns should not be more than a few MB of RAM.

The issue here is that R is very powerful for manipulating data loaded into it, but read.csv is much less useful for complex filtering.

The built-in R function 'subset' is used for this:

# replace with actual path to actual filename
data <- read.csv('data.csv')

# subset the values
sub_data <- subset(data, dataid == 739)

# write out the data
write.csv(sub_data, 'subset_filename.csv', row.names = F)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM