将自定义 function 应用于多个文件并在 R 中创建唯一的 csv output

Question

我是 R 的初学者，一直在编译代码以创建自定义 function 以对我拥有的某些数据执行特定任务。 自定义 function 的结构可识别 csv 文件中缺失的数据，并使用平均值对其进行修补。 此后，我想按年和月汇总数据并将其导出为 csv 文件。 我有多个 csv 文件位于一个文件夹中，我想对每个文件执行此任务。 到目前为止，我能够获得执行手头任务的代码，但不知道如何为每个已处理的 csv 文件编写一个唯一的 output 并将它们保存到新文件夹中。 我还想在处理后的 output 中保留原始文件名，但附加了“_processed”字样。 此外，欢迎就如何改进此代码提出任何建议。 提前致谢。

# Load all packages required by the script
library(tidyverse) # data science package
library(lubridate) # work with dates
library(dplyr)     # data manipulation (filter, summarize, mutate)
library(ggplot2)   # graphics
library(gridExtra) # tile several plots next to each other
library(scales)

# Set the working directory #
setwd("H:/Shaeden_Post_Doc/Genus_Exchange/GEE_Data/MODIS_Product_Data_Raw/Cold_Temperate_Moist")


#create a function to summarize data by year and month
#patch missing values using the average

summarize_by_month = function(df){
  
# counting unique, missing and mean values in the ET column
df %>% summarise(n = n_distinct(ET),
                   na = sum(is.na(ET)),
                   med = mean(ET, na.rm = TRUE))
  
# assign mean values to the missing data and modify the dataframe
df = df %>%
    mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
df
  
#separate data into year, month and day  
df$date = as.Date(df$date,format="%Y/%m/%d")

#summarize by year and month 

df %>%
    mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
    group_by(year, month) %>%
    summarise(mean_monthly = mean(ET))

}

#import all files and execute custom function for each
file_list = list.files(pattern="AET", full.names=TRUE)
file_list

my_AET_files = lapply(file_list, read_csv)
monthly_AET = lapply(my_AET_files, summarize_by_month)
monthly_AET

下面提供了示例数据集的链接https://drive.google.com/drive/folders/1pLHt-vT87lxzW2We-AS1PwVcne3ALP2d?usp=sharing

Answer 1

path<-"your_peferred_path/" #set a path to were you want to save the files

x<-list.files(pattern= "your_pattern") # create a list of your file names

name<-str_sub(x, start=xL, end=yL) #x & y being the part of the name you want to keep 

for (i in 1:length(monthly_AET)){
  write_excel_csv(monthly_AET[i], paste0(path, name, "_processed.csv")) # paste0 allows to create custom names from variables and static strings
}

注意：这只是一个假设，可能需要根据您的需要进行调整

Answer 2

您可以在同一个 function 中读取、操作数据和写入 csv：

library(dplyr)

summarize_by_month = function(file) {
  df <- readr::read_csv(file)

  # assign mean values to the missing data and modify the dataframe
  df = df %>% mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))

  #separate data into year, month and day  
  df$date = as.Date(df$date,format="%Y/%m/%d")

  #summarize by year and month 
  new_df <- df %>%
    mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
    group_by(year, month) %>%
    summarise(mean_monthly = mean(ET))
  
     write.csv(new_df, sprintf('output_folder/%s_processed.csv', 
           tools::file_path_sans_ext(basename(file))), row.names = FALSE)
}

monthly_AET = lapply(file_list, summarize_by_month)

将自定义 function 应用于多个文件并在 R 中创建唯一的 csv output

问题描述

2 个解决方案

解决方案1
0 2020-10-07 12:11:04

解决方案2
0 已采纳 2020-10-07 12:13:45

将自定义 function 应用于多个文件并在 R 中创建唯一的 csv output

问题描述

2 个解决方案

解决方案1 0 2020-10-07 12:11:04

解决方案2 0 已采纳 2020-10-07 12:13:45

解决方案1
0 2020-10-07 12:11:04

解决方案2
0 已采纳 2020-10-07 12:13:45