[英]aggregating output from multiple input files in R
Right now I have the R code below. 现在我有下面的R代码。 It reads in data that looks like this:
它读取的数据如下所示:
track_id day hour month year rate gate_id pres_inter vmax_inter
9 10 0 7 1 9.6451E-06 2 97809 23.545
9 10 0 7 1 9.6451E-06 17 100170 13.843
10 3 6 7 1 9.6451E-06 2 96662 31.568
13 22 12 8 1 9.6451E-06 1 94449 48.466
13 22 12 8 1 9.6451E-06 17 96749 30.55
16 13 0 8 1 9.6451E-06 4 98702 19.205
16 13 0 8 1 9.6451E-06 16 98585 18.143
19 27 6 9 1 9.6451E-06 9 98838 20.053
19 27 6 9 1 9.6451E-06 17 99221 17.677
30 13 12 6 2 9.6451E-06 2 97876 27.687
30 13 12 6 2 9.6451E-06 16 99842 18.163
32 20 18 6 2 9.6451E-06 1 99307 17.527
##################################################################
# Input / Output variables
##################################################################
for (N in (59:96)){
if (N < 10){
# TrackID <- "000$N"
TrackID <- paste("000",N, sep="")
}
else{
# TrackID <- "00$N"
TrackID <- paste("00",N, sep="")
}
print(TrackID)
# For 2010_08_24 trackset
# fname_in <- paste('input/2010_08_24/intersections_track_calibrated_jma_from1951_',TrackID,'.csv', sep="")
# fname_out <- paste('output/2010_08_24/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
# For 2012_05_01 trackset
fname_in <- paste('input/2012_05_01/intersections_track_param_',TrackID,'.csv', sep="")
fname_out <- paste('output/2012_05_01/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="")
fname_out2 <- paste('output/2012_05_01/GateID_',TrackID,'.csv', sep="")
#######################################################################
# we read the gate crossing track date
cat('reading the crosstat output file', fname_in, '\n')
header <- read.table(fname_in, nrows=1)
track <- read.table(fname_in, sep=',', skip=1)
colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")
# track_id=track[,1]
# pres_inter=track[,15]
# Function to select maximum surge by stormID
ByTrack <- ddply(track, "ID", function(x) x[which.max(x$vmax_inter),])
ByGate <- count(track, vars="gate_id")
# Write the output file with a single record per storm
cat('Writing the full output file', fname_out, '\n')
write.table(ByTrack, fname_out, col.names=T, row.names=F, sep = ',')
# Write the output file with a single record per storm
cat('Writing the full output file', fname_out2, '\n')
write.table(ByGate, fname_out2, col.names=T, row.names=F, sep = ',')
}
My output for the final section of code is a file the groups by GateID and outputs the frequency of occurrence. 我对代码最后部分的输出是一个按GateID分组的文件,并输出出现的频率。 It looks like this:
看起来像这样:
gate_id freq
1 935
2 2096
3 1363
4 963
5 167
6 17
7 43
8 62
9 208
10 267
11 64
12 162
13 178
14 632
15 807
16 2003
17 838
18 293
The thing is that I output a file that looks just like this for 96 different input files. 事实是,对于96个不同的输入文件,我输出的文件看起来像这样。 Instead of outputting 96 separate files, I'd like to calculate these aggregations per input file, and then sum the frequency across all 96 inputs and print out one SINGLE output file.
我不想输出96个单独的文件,而是要计算每个输入文件的这些聚合,然后将所有96个输入的频率求和,然后打印出一个SINGLE输出文件。 Can anyone help?
有人可以帮忙吗?
Thanks, K 谢谢,K
You are going to need to do something like the function below. 您将需要执行以下功能。 This would grab all the .csv files in one directory, so that directory would have to have only the files you want to analyze in it.
这将在一个目录中捕获所有.csv文件,因此该目录中只需要包含要分析的文件即可。
myFun <- function(out.file = "mydata") {
files <- list.files(pattern = "\\.(csv|CSV)$")
# Use this next line if you are going use the file name as a variable/output etc
files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4)
for (i in 1:length(files)) {
temp <- read.csv(files[i], header = FALSE)
# YOUR CODE HERE
# Use the code you have already written but operate on files[i] or temp
# Save the important stuff into one data frame that grows
# Think carefully ahead of time what structure makes the most sense
}
datafile <- paste(out.file, ".csv", sep = "")
write.csv(yourDataFrame, file = datafile)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.