[英]R: Loop to create new data frames in R
我正在尝试创建一个循环,为 VCS 站创建许多文件,这些文件根据其站名命名。 下面是为一个站执行此操作的代码,我试图将其变成一个循环,以便可以为 68 个站执行此操作。 (即,如果我正在复制和粘贴,我将用不同的电台名称替换 P205187,例如 P205200)。 我在名为 VCS.Sites 的 dataframe 中有各个站名(例如 P205187)。 谁能指出我正确的方向? 新的 R 用户在这里,我被卡住了!
P205187 <- VCSrawdata[VCSrawdata$Network_ID=="P205187",] #create a file for VCS station P205187
#clean up after subset
P205187$Network_ID <- factor(P205187$Network_ID)
# create annual file for VCS station P205187
P205187_annual <- group_by(P205187,Year,DESCRIPTION)
P205187_annual <- summarise(P205187_annual,Sum_Annual = sum(Value), Mean_Annual = mean(Value), CountDays=n())
# create monthly file for VCS station P205187
P205187_monthly <- group_by(P205187,Year, Month,DESCRIPTION)
P205187_monthly <- summarise(P205187_monthly,Sum_Monthly = sum(Value),Mean_monthly = mean(Value),CountDays=n())
你可以用一个 lapply 循环做得很好。 像这样的东西:
list_of_ids <- c("List be here")
monthly <- function(id){
P205187 <- VCSrawdata[VCSrawdata$Network_ID==id,] #create a file for VCS station P205187
#clean up after subset
P205187$Network_ID <- factor(P205187$Network_ID)
# create annual file for VCS station P205187
P205187_annual <- group_by(P205187,Year,DESCRIPTION)
P205187_annual <- summarise(P205187_annual,Sum_Annual = sum(Value), Mean_Annual = mean(Value), CountDays=n())
# create monthly file for VCS station P205187
P205187_monthly <- group_by(P205187,Year, Month,DESCRIPTION)
P205187_monthly <- summarise(P205187_monthly,Sum_Monthly = sum(Value),Mean_monthly = mean(Value),CountDays=n())
return(P205187_monthly)
}
monthlies <- lapply(list_of_ids, monthly)
听起来这是为了写 csvs。 我们可以使用group_map
中的group_map循环遍历所有站点并写入 csv。
library(dplyr)
VCSrawdata %>%
group_by(Network_ID) %>%
group_walk(~ {
.x%>%
group_by(Year, DESCRIPTION) %>%
summarize(sum_annual = sum(Value),
mean_annual = mean(Value),
countDays = n())%>%
write.csv(file = paste0(.y$Network_ID, "_annual_csv"))
.x%>%
group_by(Year, Month, DESCRIPTION) %>%
summarize(sum_month = sum(Value),
mean_month = mean(Value),
countDays = n())%>%
write.csv(file = paste0(.y$Network_ID, "_month_csv"))
}
)
注意事项:
.x
指的是由Network_ID
拆分的分组 tibble.y
指的是分组。 在这种情况下,我们只有Network_ID
。只需在定义的方法中概括您的过程,然后在循环中将站点名称作为参数传递或应用function 来迭代站点。 使用这种方法,您可以避免许多单独的对象淹没全局环境,而是使用许多底层元素的单个命名列表来更好地序列化和组织。
summarize_stations <- function(station_name) {
tmp_df <- VCSrawdata[VCSrawdata$Network_ID==station_name,]
tmp_df$Network_ID <- factor(tmp_df$Network_ID)
# create annual file for VCS station
tmp_annual <- summarise(group_by(tmp,Year,DESCRIPTION),
Sum_Annual = sum(Value),
Mean_Annual = mean(Value),
CountDays=n())
# create monthly file for VCS station
tmp_monthly <- summarise(group_by(tmp, Year, Month,DESCRIPTION),
Sum_Annual = sum(Value),
Mean_Annual = mean(Value),
CountDays=n())
# RETURN NAMED LIST OF BOTH AGGREGATIONS
return(list(annual=tmp_annual, monthly=tmp_monthly))
}
station_list <- sapply(VCS.Sites$station_names, summarize_stations, simplify=FALSE)
# ACCESS UNDERLYING ELEMENTS
station_list$P205187$annual
station_list$P205187$monthly
...
您甚至可以使用by
(面向对象的包装器tapply
)按Network_ID
对VCSrawdata
进行子集(假设它包括您需要的所有站点)。 为此,稍微调整 function 以接收数据帧作为允许您跳过子集行的参数。
summarize_stations <- function(tmp_df) {
# REMOVE SUBSET LINE
# tmp_df <- VCSrawdata[VCSrawdata$Network_ID=="P205187",]
...keep same code as above
}
station_list <- by(VCSrawdata, VCSrawdata$Network_ID, FUN=summarize_stations)
# ACCESS UNDERLYING ELEMENTS
station_list$P205187$annual
station_list$P205187$monthly
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.