简体   繁体   中英

iterate reading/mutating csv files in R purr

I have a folder of csv files in R that will need to loop through, clean, and create in columns based on information in the file name. I am trying to use purr and this is what I have done so far.

# get file names
files_names <- list.files("data/", recursive = TRUE, full.names = TRUE) 

# inspect
files_names 

[1] "data/BOC_All_ATMImage_(Aug 2020).txt" "data/BOC_All_ATMImage_(Aug 2021).txt" "data/BOC_All_ATMImage_(Feb 2021).txt"
[4] "data/BOC_All_ATMImage_(May 2021).txt" "data/BOC_All_ATMImage_(Nov 2020).txt" "data/BOC_All_ATMImage_(Nov 2021).txt"

# extract month/year inside brackets and convert to snakecase
# this will be used later to create column names

names_data <- files_names %>% 
  str_extract(., "(?<=\\().*?(?=\\))") %>% 
  str_to_lower() %>%
  str_replace(., " ", "_")

column_names

[1] "aug_2020" "aug_2021" "feb_2021" "may_2021" "nov_2020" "nov_2021"

now loop through the csvs, read each csv, do some data cleaning and create columns


mc_data <-
  map(files_names,
         ~ read_csv(.x, guess_max = 50000) %>%
        janitor::clean_names() %>% 
           mutate(month_year = str_extract(.x, "(?<=\\().*?(?=\\))"),
                  date_dmy = paste0(day, "-", month_year),
                  date = dmy(date_dmy),
                  fsa = str_sub(postal_code, start = 1, end=3),
                  ?? = 1) %>% 
         select(-date_dmy),
         .id = "group"
  )

I need to mutate one more column and that column has to named based on this names_data extracted. I currently have this as ?? in the fake code above. names_data follows the same order as the file path so the idea is to do it in one loop and save each data after it has been cleaned.

We can use glue syntax and map2 . Perhaps:

mc_data <-
    map2(files_names, column_names,
        ~ read_csv(.x, guess_max = 50000) %>%
            janitor::clean_names() %>% 
            mutate(month_year = str_extract(.x, "(?<=\\().*?(?=\\))"),
                   date_dmy = paste0(day, "-", month_year),
                   date = dmy(date_dmy),
                   fsa = str_sub(postal_code, start = 1, end=3),
                   '{.y}' := 1) %>% 
            select(-date_dmy),
        .id = "group"
    )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM