[英]Applying a custom function to multiple files and creating unique csv output in R
我是 R 的初學者,一直在編譯代碼以創建自定義 function 以對我擁有的某些數據執行特定任務。 自定義 function 的結構可識別 csv 文件中缺失的數據,並使用平均值對其進行修補。 此后,我想按年和月匯總數據並將其導出為 csv 文件。 我有多個 csv 文件位於一個文件夾中,我想對每個文件執行此任務。 到目前為止,我能夠獲得執行手頭任務的代碼,但不知道如何為每個已處理的 csv 文件編寫一個唯一的 output 並將它們保存到新文件夾中。 我還想在處理后的 output 中保留原始文件名,但附加了“_processed”字樣。 此外,歡迎就如何改進此代碼提出任何建議。 提前致謝。
# Load all packages required by the script
library(tidyverse) # data science package
library(lubridate) # work with dates
library(dplyr) # data manipulation (filter, summarize, mutate)
library(ggplot2) # graphics
library(gridExtra) # tile several plots next to each other
library(scales)
# Set the working directory #
setwd("H:/Shaeden_Post_Doc/Genus_Exchange/GEE_Data/MODIS_Product_Data_Raw/Cold_Temperate_Moist")
#create a function to summarize data by year and month
#patch missing values using the average
summarize_by_month = function(df){
# counting unique, missing and mean values in the ET column
df %>% summarise(n = n_distinct(ET),
na = sum(is.na(ET)),
med = mean(ET, na.rm = TRUE))
# assign mean values to the missing data and modify the dataframe
df = df %>%
mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
df
#separate data into year, month and day
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month
df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))
}
#import all files and execute custom function for each
file_list = list.files(pattern="AET", full.names=TRUE)
file_list
my_AET_files = lapply(file_list, read_csv)
monthly_AET = lapply(my_AET_files, summarize_by_month)
monthly_AET
下面提供了示例數據集的鏈接https://drive.google.com/drive/folders/1pLHt-vT87lxzW2We-AS1PwVcne3ALP2d?usp=sharing
path<-"your_peferred_path/" #set a path to were you want to save the files
x<-list.files(pattern= "your_pattern") # create a list of your file names
name<-str_sub(x, start=xL, end=yL) #x & y being the part of the name you want to keep
for (i in 1:length(monthly_AET)){
write_excel_csv(monthly_AET[i], paste0(path, name, "_processed.csv")) # paste0 allows to create custom names from variables and static strings
}
注意:這只是一個假設,可能需要根據您的需要進行調整
您可以在同一個 function 中讀取、操作數據和寫入 csv:
library(dplyr)
summarize_by_month = function(file) {
df <- readr::read_csv(file)
# assign mean values to the missing data and modify the dataframe
df = df %>% mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
#separate data into year, month and day
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month
new_df <- df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))
write.csv(new_df, sprintf('output_folder/%s_processed.csv',
tools::file_path_sans_ext(basename(file))), row.names = FALSE)
}
monthly_AET = lapply(file_list, summarize_by_month)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.