在文件夾中的多個 .txt 文件上應用 R 腳本

Question

我對構建函數和循環非常陌生。 我查看了與我的問題類似的以前的問題，但似乎找不到解決我的問題的方法。 我的目標是從這樣的網頁中提取氣候數據：

https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE3065&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd&model=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50apsim&what=gdd_50apsim&what=逗號&gis=no&scenario_year=2019

我將使用這些數據來計算作物生長模型的生長期天數。 我已經成功地使用 for 循環提取數據。

uticaNE <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE8745&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year=2019"

friendNE <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE3065&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year=2019"

location.urls <- c(uticaNE, friendNE)
location.meso.files <- c("uticaNe.txt", "friendNE.txt")

for(i in seq_along(location.urls)){
  download.file(location.urls[i], location.meso.files[i], method="libcurl")
}

我每天將有大約 20 個位置提取數據。 我想要做的是將計算華氏度、GDD 等的任務應用到每個文件，並分別保存每個文件的輸出。

這是我目前擁有的以下代碼。

files <- list.files(pattern="*.txt", full.names=TRUE, recursive=FALSE)

  func <- for (i in 1:length(files)){
  df <- read.table(files[i], skip=10, stringsAsFactors = 
  FALSE)
  colnames(df) <- c("year", "day", "solrad", "maxC", 
  "minC", "precipmm")
  df$year <- as.f(df$year)
  df$day <- as.factor(df$day)
  df$maxF <- (df$maxC * (9/5) + 32)
  df$minF <- (df$minC * (9/5) + 32)
  df$GDD <- (((df$maxF + df$minF)/2)-50)
  df$GDD[df$GDD <= 0] <- 0
  df$GDD.cumulateive <- cumsum(df$GDD)
  df$precipmm.cumulative <- cumsum(df$precipmm)
  return(df)
  write.table(df, path="./output", quote=FALSE, 
  row.names=FALSE, col.names=TRUE)
}

data <- apply(files, func)

任何幫助將不勝感激。

-ML

Answer 1

您可以安裝 tidyverse 庫，而不是使用 base R which 。 https://www.tidyverse.org/在其中您可以使用 read_tsv 函數將鏈接作為 tsv（制表符分隔值）加載到數據框中。

dataframe<-read_tsv(url("http://some.where.net/"))

然后在R中創建一個循環並進行計算

something<-c('link1','link2') #vector in R
for(i in someting){
 #make sure to indent with one space
}

最后，您使用以下命令將數據框保存到文件中

write_csv(dataframe, file = "c:\\myname\\yourfile.csv")

Answer 2

這是一種使用基本 R 和帶有匿名函數的lapply()的方法來下載數據，將其讀入數據框，將轉換添加到華氏度和累積降水量，然后寫入輸出文件。

首先，我們創建將下載數據的氣象站列表

# list of 10 stations
stationList <- c("NE3065","NE8745","NE0030","NE0050","NE0130",
                 "NE0245","NE0320","NE0355","NE0375","NE0420")

這里我們創建了兩個 URL 片段，一個用於站點標識符之前的 URL 內容，另一個用於站點標識符之后的 URL 內容。

urlFragment1 <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations="
urlFragment2 <- "&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year"

接下來，我們創建輸入和輸出目錄，一個用於存儲下載的氣候輸入文件，另一個用於輸出文件。

# create input and output file directories if they do not already exist 
if(!dir.exists("./data")) dir.create("./data")
if(!dir.exists("./data/output")) dir.create("./data/output")

lapply()函數使用paste0()將電台名稱添加到我們上面創建的 URL 片段中，使我們能夠針對每個輸入文件自動進行下載和后續操作。

stationData <- lapply(stationList,function(x){
     theURL <-paste0(urlFragment1,x,urlFragment2)
     download.file(theURL,
                   paste0("./data/",x,".txt"),method="libcurl")
     df <- read.table(paste0("./data/",x,".txt"), skip=11, stringsAsFactors = 
                           FALSE)
     colnames(df) <- c("year", "day", "solrad", "maxC", 
                       "minC", "precipmm")
     df$year <- as.factor(df$year)
     df$day <- as.factor(df$day)
     df$maxF <- (df$maxC * (9/5) + 32)
     df$minF <- (df$minC * (9/5) + 32)
     df$GDD <- (((df$maxF + df$minF)/2)-50)
     df$GDD[df$GDD <= 0] <- 0
     df$GDD.cumulative <- cumsum(df$GDD)
     df$precipmm.cumulative <- cumsum(df$precipmm)
     df$station <- x
     write.table(df,file=paste0("./data/output/",x,".txt"), quote=FALSE, 
                 row.names=FALSE, col.names=TRUE)
     df
})
# add names to the data frames returned by lapply()
names(stationData) <- stationList

...和輸出，一個目錄，包含一個文件，用於在stationList對象中列出的每個站。

最后，這里是已經寫入./data/output/NE3065.txt文件的數據。

year day solrad maxC minC precipmm maxF minF GDD GDD.cumulateive precipmm.cumulative station
2020 1 8.992 2.2 -5 0 35.96 23 0 0 0 NE3065
2020 2 9.604 5.6 -3.9 0 42.08 24.98 0 0 0 NE3065
2020 3 4.933 5.6 -3.9 0 42.08 24.98 0 0 0 NE3065
2020 4 8.699 3.9 -7.2 0 39.02 19.04 0 0 0 NE3065
2020 5 9.859 6.1 -7.8 0 42.98 17.96 0 0 0 NE3065
2020 6 10.137 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 7 8.754 6.1 -4.4 0 42.98 24.08 0 0 0 NE3065
2020 8 10.121 7.8 -5 0 46.04 23 0 0 0 NE3065
2020 9 9.953 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 10 8.905 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 11 0.416 -3.9 -15.6 2.29 24.98 3.92 0 0 2.29 NE3065
2020 12 10.694 -4.4 -16.1 0 24.08 3.02 0 0 2.29 NE3065
2020 13 1.896 -4.4 -11.1 0.51 24.08 12.02 0 0 2.8 NE3065
2020 14 0.851 0 -7.8 0 32 17.96 0 0 2.8 NE3065
2020 15 11.043 -1.1 -8.9 0 30.02 15.98 0 0 2.8 NE3065
2020 16 10.144 -2.8 -17.2 0 26.96 1.04 0 0 2.8 NE3065
2020 17 10.75 -5.6 -17.2 3.05 21.92 1.04 0 0 5.85 NE3065

請注意，輸入文件中有 11 行標題數據，因此必須將read.table()的skip=參數設置為 11，而不是 OP 中使用的 10。

增強代碼

匿名函數中的最后一行將數據幀返回給父環境，從而生成一個包含 10 個數據幀的列表，該列表存儲在stationData對象中。 由於我們將站名分配給每個數據幀中的一列，因此我們可以將數據幀合並為單個數據幀以供后續分析，使用do.call()和rbind()如下。

combinedData <- do.call(rbind,stationData)

由於此代碼是在 1 月 17 日運行的，因此生成的數據框包含 170 個觀測值，或者我們下載數據的 10 個站點中的每個觀測站的 17 個觀測值。

此時可以按站對數據進行分析，例如按站查找年迄今的平均降水量。

> aggregate(precipmm ~ station,combinedData,mean)
   station   precipmm
1   NE0030 0.01470588
2   NE0050 0.56764706
3   NE0130 0.32882353
4   NE0245 0.25411765
5   NE0320 0.28411765
6   NE0355 1.49411765
7   NE0375 0.55235294
8   NE0420 0.13411765
9   NE3065 0.34411765
10  NE8745 0.47823529
>

在文件夾中的多個 .txt 文件上應用 R 腳本

問題描述

2 個解決方案

解決方案1
0 2020-01-17 21:08:36

解決方案2
0 已采納 2020-01-18 14:25:33

增強代碼

在文件夾中的多個 .txt 文件上應用 R 腳本

問題描述

2 個解決方案

解決方案1 0 2020-01-17 21:08:36

解決方案2 0 已采納 2020-01-18 14:25:33

增強代碼

解決方案1
0 2020-01-17 21:08:36

解決方案2
0 已采納 2020-01-18 14:25:33