简体   繁体   English

聚合R中多个输入文件的输出

[英]aggregating output from multiple input files in R

Right now I have the R code below. 现在我有下面的R代码。 It reads in data that looks like this: 它读取的数据如下所示:

track_id    day hour    month   year    rate    gate_id pres_inter  vmax_inter
9   10  0   7   1   9.6451E-06  2   97809   23.545
9   10  0   7   1   9.6451E-06  17  100170  13.843
10  3   6   7   1   9.6451E-06  2   96662   31.568
13  22  12  8   1   9.6451E-06  1   94449   48.466
13  22  12  8   1   9.6451E-06  17  96749   30.55
16  13  0   8   1   9.6451E-06  4   98702   19.205
16  13  0   8   1   9.6451E-06  16  98585   18.143
19  27  6   9   1   9.6451E-06  9   98838   20.053
19  27  6   9   1   9.6451E-06  17  99221   17.677
30  13  12  6   2   9.6451E-06  2   97876   27.687
30  13  12  6   2   9.6451E-06  16  99842   18.163
32  20  18  6   2   9.6451E-06  1   99307   17.527


##################################################################
# Input / Output variables
##################################################################
for (N in (59:96)){
  if (N < 10){
#     TrackID <- "000$N"
     TrackID <- paste("000",N, sep="")
  }
  else{
#     TrackID <- "00$N"
     TrackID <- paste("00",N, sep="")
  }
  print(TrackID)

# For 2010_08_24 trackset
#  fname_in <- paste('input/2010_08_24/intersections_track_calibrated_jma_from1951_',TrackID,'.csv', sep="")
#  fname_out <- paste('output/2010_08_24/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
# For 2012_05_01 trackset
  fname_in <- paste('input/2012_05_01/intersections_track_param_',TrackID,'.csv', sep="")
  fname_out <- paste('output/2012_05_01/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
  fname_out2 <- paste('output/2012_05_01/GateID_',TrackID,'.csv', sep="")

#######################################################################
# we read the gate crossing track date
  cat('reading the crosstat output file', fname_in, '\n')
  header <- read.table(fname_in, nrows=1)
  track <- read.table(fname_in, sep=',', skip=1)
  colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")

#  track_id=track[,1]
#  pres_inter=track[,15]

# Function to select maximum surge by stormID 
  ByTrack <- ddply(track, "ID", function(x) x[which.max(x$vmax_inter),])
  ByGate <- count(track, vars="gate_id")

# Write the output file with a single record per storm                     
  cat('Writing the full output file', fname_out, '\n')
  write.table(ByTrack, fname_out, col.names=T, row.names=F, sep = ',')

# Write the output file with a single record per storm                     
   cat('Writing the full output file', fname_out2, '\n')
   write.table(ByGate, fname_out2, col.names=T, row.names=F, sep = ',')
}

My output for the final section of code is a file the groups by GateID and outputs the frequency of occurrence. 我对代码最后部分的输出是一个按GateID分组的文件,并输出出现的频率。 It looks like this: 看起来像这样:

gate_id freq
1   935
2   2096
3   1363
4   963
5   167
6   17
7   43
8   62
9   208
10  267
11  64
12  162
13  178
14  632
15  807
16  2003
17  838
18  293

The thing is that I output a file that looks just like this for 96 different input files. 事实是,对于96个不同的输入文件,我输出的文件看起来像这样。 Instead of outputting 96 separate files, I'd like to calculate these aggregations per input file, and then sum the frequency across all 96 inputs and print out one SINGLE output file. 我不想输出96个单独的文件,而是要计算每个输入文件的这些聚合,然后将所有96个输入的频率求和,然后打印出一个SINGLE输出文件。 Can anyone help? 有人可以帮忙吗?

Thanks, K 谢谢,K

You are going to need to do something like the function below. 您将需要执行以下功能。 This would grab all the .csv files in one directory, so that directory would have to have only the files you want to analyze in it. 这将在一个目录中捕获所有.csv文件,因此该目录中只需要包含要分析的文件即可。

myFun <- function(out.file = "mydata") {
files <- list.files(pattern = "\\.(csv|CSV)$")
# Use this next line if you are going use the file name as a variable/output etc
files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4)

for (i in 1:length(files)) {
    temp <- read.csv(files[i], header = FALSE)
    # YOUR CODE HERE
    # Use the code you have already written but operate on files[i] or temp
    # Save the important stuff into one data frame that grows
    # Think carefully ahead of time what structure makes the  most sense
    }

datafile <- paste(out.file, ".csv", sep = "")
write.csv(yourDataFrame, file = datafile)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM