简体   繁体   English

R function 生成并保存 RMarkdown pdf 并遍历文件夹中的多个 CSV

[英]R function to generate and save RMarkdown pdf and iterate over multiple CSVs in folder

I have many CSV files with time series data from environmental sensors.我有许多 CSV 文件,其中包含来自环境传感器的时间序列数据。 All of them have columns with the same names/order, and they look like this:它们都有相同名称/顺序的列,它们看起来像这样:

# create time series columns
datetime <- as.POSIXct(c("2022-01-14 17:00:00 UTC", "2022-01-14 17:15:00 UTC", "2022-01-14 17:30:00 UTC", "2022-01-14 17:45:00 UTC", "2022-01-14 18:00:00 UTC"))
siteID <- rep("04M09_2", 10)
tempC <- c(6.9783360, 6.5733036, 5.3476500, 4.1025504, 3.2613720, 
           2.4101928, 1.6562436, 1.2212088, 1.0028580, 0.8928492)
SpC <- rep(0, 10)
wetdry <- rep("dry", 10)
lat <- rep(39.07982, 10)
long <- rep(-96.5816, 10)
field_SpC <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 20)

# make data frame
sensor_04M09 <- data.frame(datetime, siteID, tempC, SpC, wetdry, lat, long, field_SpC)

I would like to write an R function that I could iterate over an entire folder of CSV data from these sensors (one CSV file per sensor) to produce and save a pdf of this document for each sensor. I would like to write an R function that I could iterate over an entire folder of CSV data from these sensors (one CSV file per sensor) to produce and save a pdf of this document for each sensor. Here is what I want the Markdown to look like.这就是我希望 Markdown 的样子。 (Note: the csv that I show as being read in at first is like the one I created for this example above) (注意:我首先显示为被读取的 csv 就像我为上面的示例创建的那个)

---
title: "04M09_2 STIC Summary"
author: "Me"
date: '2022-09-08'
output: pdf_document
---
knitr::opts_chunk$set(echo = TRUE)

Bring in processed STIC data frame引入处理后的 STIC 数据帧

library(tidyverse)
sensor_04M09 <- read_csv("sensor_04M09.csv")

head(sensor_04M09)

Time series of SpC colored by wet/dry designation (red dot represents field SpC measurement)湿/干指定颜色的 SpC 时间序列(红点代表现场 SpC 测量)

ggplot(Sensor_04M09, aes(x = datetime, y = SpC, color = wetdry, group = 1)) + 
  geom_path(size = 0.7) + 
  geom_point(aes(x = datetime, y = field_SpC), size = 3, color = "red") +
   theme_bw() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1)) + 
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))

Time series of Temperature (C) recorded by sensor传感器记录的温度 (C) 时间序列

ggplot(Sensor_04M09, aes(x = datetime, y = tempC)) + 
  geom_path() + 
  geom_smooth(color = "steelblue", se = FALSE) +
   theme_bw() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1)) + 
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))

Map of Sensor location Map 传感器位置

library(Rcpp)
library(sp)
library(raster)
library(rgdal)
library(rasterVis)
library(sf)

# Bring in stream line shape files
konza_streams <- st_read("GIS210/GIS210.shp")

sensor_location <- st_as_sf(STIC_KNZ_04M09_00_LS,
                                coords = c("long", "lat"), 
                                crs = 4326)

ggplot() + 
  geom_sf(data = konza_streams) + 
  geom_sf(data = sensor_location, size = 3, color = "red") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1)) + 
  theme(axis.text = element_text(size = 9),
        axis.title = element_text(size = 12)) +
  xlab("Longitude") + 
  ylab("Latitude") + 
  coord_sf(xlim = c(708000.9  , 710500.3 ), ylim = c(4327200.8  , 4330000.0 ), expand = FALSE)

The purpose of creating and saving these pdf markdown docs for each CSV file in the folder is for a visual QAQC check of the data from each sensor.为文件夹中的每个 CSV 文件创建和保存这些 pdf markdown 文档的目的是对来自每个传感器的数据进行视觉 QAQC 检查。

RMarkdown documents allow parameters, which you can set while rendering the document to change some aspects of how it runs. RMarkdown 文档允许参数,您可以在渲染文档时设置这些参数以更改其运行方式的某些方面。 See this page for additional detail on using parameters.有关使用参数的更多详细信息,请参阅此页面

In your RMarkdown document, set up your YAML to accept the parameter:在您的 RMarkdown 文档中,设置您的 YAML 以接受参数:

---
title: "04M09_2 STIC Summary"
author: "Me"
date: '2022-09-08'
output: pdf_document
params:
    datafile: "sensor.csv"
---

Then, in the code, use that parameter to select the data file of interest, and run all your calculations/graphs:然后,在代码中,将该参数用于 select 感兴趣的数据文件,并运行所有计算/图表:

library(tidyverse)
sensor_04M09 <- read_csv(params$datafile)

# And then put all your graphing code, etc., using the sensor_04M09 dataset
# (or name it something more generic)

Now you have an RMarkdown doc that can be given a filename and will produce your range of graphs using the data from that file.现在你有一个 RMarkdown 文档,它可以被赋予一个文件名,并将使用该文件中的数据生成你的图形范围。 Save that as analysis_file.rmd or something.将其保存为analysis_file.rmd或其他内容。

Finally, produce a script to loop over all the files you want it to run through.最后,生成一个脚本来遍历您希望它运行的所有文件。 In a separate .R script:在单独的.R脚本中:

library(rmarkdown)
library(stringr)

# directory is the folder with all your data files in it
list_of_files = list.files('directory', '.csv')
for (f in list_of_files) {
    # Get the names of the sensors alone, for making filenames
    outputname = paste0('analysis_of_',str_sub(f, 1, nchar(f) - 4),'.pdf')
    # get the full filename of the data file
    full_file = paste0('directory/',f)
    render("analysis_file.rmd", output_file = outputname, params = list(datafile = full_file))
}

This will loop through all the files and render the document once for each.这将遍历所有文件并为每个文件渲染一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM