簡體   English   中英

在 R 中,如何在新數據框中循環遍歷 csv 文件和線性回歸的安全輸出?

[英]In R how to loop through csv files and safe outputs of linear regression in new dataframe?

我的腳本和前 3 個 csv 文件之一可以在我的Github文件夾中找到

我已將 NDVI 和氣候數據列表拆分為小 csv。 每個文件包含 34 年的數據。

然后,每 34 年應根據沖突年份分為兩部分,保存在同一個表和特定時間范圍內。 但是這部分代碼已經可以工作了。

現在我想用第一部分的氣候數據控制列表的第二部分,通過使用多元線性回歸,這也完成了。

我基本上需要做一個循環來存儲來自一個 csv 的每一輪 lm 函數的所有系數。 文件在一個新列表中。

我知道我可以使用 lapply 來循環並將輸出作為列表。 但是有一些缺失的部分實際上可以循環遍歷 csv。 文件。

#load libraries
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(ggpubr)
library(plyr)
library(tidyverse)
library(fs)


file_paths <- fs::dir_ls("E:\\PYTHON_ST\\breakCSV_PYTHON\\AIM_2_regions\\Afghanistan")
file_paths

#create empty list and fill with file paths and loop through them
file_contents <- list()
for (i in seq_along(file_paths)) { #seq_along for vectors (list of file paths is a vector)
  file_contents[[i]] <- read_csv(file = file_paths[[i]])
                  
                for (i in seq_len(file_contents[[i]])){ # redundant?
                  
                 # do all the following steps in every file                                        
                 
                 # Step 1) 
                 # Define years to divide table
                 
                 #select conflict year in df 
                 ConflictYear = file_contents[[i]][1,9]
                 ConflictYear
                 
                 # select Start year of regression in df
                 SlopeYears = file_contents[[i]][1,7] #to get slope years (e.g.17)
                 BCStartYear = ConflictYear-SlopeYears #to get start year for regression
                 BCStartYear
                 
                 #End year of regression
                 ACEndYear = ConflictYear+(SlopeYears-1) # -1 because the conflict year is included
                 ACEndYear
                 
                 
                 # Step 2
                 
                 #select needed rows from df
                 #no headers but row numbers. NDVI.Year = [r1-r34,c2]
                 NDVI.Year <- file_contents[[i]][1:34,2]
                 NDVI <- file_contents[[i]][1:34,21]
                 T.annual.max <- file_contents[[i]][1:34,19]
                 Prec.annual.max <- file_contents[[i]][1:34,20]
                 soilM.annual.max <- file_contents[[i]][1:34,18]
                 
                 #Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years
                 #Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc
                 BeforeConf1 <- file_contents[[i]][ which(file_contents[[i]]$NDVI.Year >= BCStartYear & file_contents[[i]]$NDVI.Year < ConflictYear),] #eg. 1982 to 1999
                 BeforeConf2 <-  c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max) #which columns to include
                 BeforeConf <- BeforeConf1[BeforeConf2] #create table
                 
                 AfterConf1 <- myFiles[ which(file_contents[[i]]$NDVI.Year >= ConflictYear & file_contents[[i]]$NDVI.Year <= ACEndYear),] #eg. 1999 to 2015
                 AfterConf2 <-  c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max)
                 AfterConf <- AfterConf1[AfterConf2]
                 
                 #Step 3)a)
                 #create empty list, to fill with coefficient results from each model results for each csv file and safe in new list
                 
                 #Create an empty df for the output coefficients
                 names <- c("(Intercept)","BeforeConf$T.annual.max","BeforeConf$Prec.annual.max","BeforeConf$soilM.annual.max")
                 coef_df <- data.frame()
                 for (k in names) coef_df[[k]] <- as.character() 
                 
                 #Apply Multiple Linear Regression
                 plyrFunc <- function(x){
                   model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max, data = BeforeConf)
                   return(summary(model)$coefficients[1,1:4])
                 }
                 
                 coef_df <- ddply(BeforeConf, .(), x)
                 coef_DF
    }}

由於您有用於單個 CSV 的代碼,請考慮分離進程和循環。 具體來說:

  1. 創建一個接收單個 csv 路徑作為輸入參數的函數,並為單個文件執行您需要的所有操作。

     get_coeffs <- function(csv_path) { df <- read.csv(csv_path) ### Step 1 # select conflict year, start year, and end year in df ConflictYear <- df[1,9] SlopeYears <- df[1,7] # to get slope years (eg17) BCStartYear <- ConflictYear - SlopeYears # to get start year for regression ACEndYear <- ConflictYear + (SlopeYears-1) # -1 because the conflict year is included ### Step 2 # select needed rows from df #no headers but row numbers. NDVI.Year = [r1-r34,c2] NDVI.Year <- df[1:34, 2] NDVI <- df[1:34, 21] T.annual.max <- df[1:34, 19] Prec.annual.max <- df[1:34, 20] soilM.annual.max <- df[1:34, 18] # Define BeforeConf and AfterConf depending on Slope Year number and Conflict Years # Go through NDVI.Year till Conflict.Year (-1 year) since the conflict year is not included in bc BeforeConf1 <- df[ which(df$NDVI.Year >= BCStartYear & df$NDVI.Year < ConflictYear),] BeforeConf2 <- c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max) BeforeConf <- BeforeConf1[BeforeConf2] #create table AfterConf1 <- myFiles[ which(df$NDVI.Year >= ConflictYear & df$NDVI.Year <= ACEndYear),] AfterConf2 <- c(NDVI.Year, NDVI, T.annual.max, Prec.annual.max, soilM.annual.max) AfterConf <- AfterConf1[AfterConf2] ### Step 3 tryCatch({ # Run model and return coefficients model <- lm(NDVI ~ T.annual.max + Prec.annual.max + soilM.annual.max, data = BeforeConf) return(summary(model)$coefficients[1,1:4]) }, error = function(e) { print(e) return(rep(NA, 4)) }) }
  2. 循環遍歷 csv 路徑,將每個文件傳遞到您的函數中,構建一個結果列表,您可以使用lapply處理列表返回或sapply (或指定長度和類型的vapply )以簡化返回,例如向量、矩陣/數組(如果適用)。

     mypath <- "E:\\\\PYTHON_ST\\\\breakCSV_PYTHON\\\\AIM_2_regions\\\\Afghanistan" file_paths <- list.files(pattern=".csv", path=mypath) # LIST RETURN result_list <- lapply(file_paths, get_coeffs) # MATRIX RETURN results_matrix <- sapply(file_paths, get_coeffs) results_matrix <- vapply(file_paths, get_coeffs, numeric(4))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM