R purrr::map() & mutate(): Add many new columns based on variables in list

Question

I need to create a dataframe summarising information relating to file checking.

I have a list of 126 unique combinations of climate scenarios and years (eg 'ssp126_2030', 'ssp126_2050', 'ssp145_2030', 'ssp245_2050'). These unique elements represent sections of a larger full file path pointing to a specific file ( scenario_list , below). For each unique element, I need to create multiple new columns specifying whether the file exists, its size and the date it was created.

I would like to loop through the list of 126 elements and stitch together a table of file checks ( file_check_table , below). I start with a table of sub-directories, I then split these strings into sections so I can paste0() together a string that points to the file within the sub-directory that I want to check. I am aiming to use mutate()/transmutate() and purrr::map() to loop through each element in the climate scenario list and add multiple file checking columns (see below image of table).

I am new to functional programming, and this is what I have tried so far I was thinking of creating a function to add new columns, and then apply the function to list of climate scenarios. My end goal is to have one new column for each climate scenario and type of file check:

file_checks <- function(x) {
                       dir_list %>%
                       mutate(file_check_table,!!paste0(new_col_name) := ifelse(file.exists(paste0(file))==TRUE,1,0))}

file_check_table <- map(scenario_list, file_checks(x))

However, this function does not work as I don't think I have written the function correctly or perhaps used purrr correctly. Any thought on how to fix this would be much appreciated, thank you. This is what I would like file_check_table

Answer 1

If I understand your question correctly, you have a scenario_list that describes the path to the files, and would like the characteristics of the files. The natural way to do that would be to run a pipe with one entry per row, no reason to put it in a function.

For example:

library(tidyverse)

scenario_list <- read_lines("scenario_list.txt")
root_dir <- "C:/USers/Documents/my_project/data_subdir"

file_table <- tibble(scenario = scenario_list) %>%
  mutate(path = file.path(root_dir, paste0(scenario, ".csv")),
         exists = file.exists(path),
         full_info = file.info(path),
         file_size = full_info$size,
         file_date = full_info$mtime)

And then if you want the output on a single row as in your screenshot:

file_table %>%
  select(-path, -full_info) %>%
  pivot_wider(names_from = scenario,
              names_glue = "{scenario}_{.value}",
              values_from = !scenario) %>%
  write_csv("output.csv")

R purrr::map() & mutate(): Add many new columns based on variables in list

Question

1 answers

solution1
1 2020-11-20 03:50:05

R purrr::map() & mutate(): Add many new columns based on variables in list

Question

1 answers

solution1 1 2020-11-20 03:50:05

solution1
1 2020-11-20 03:50:05