How to read a CSV file into R using partial file name?

Question

I have a program that outputs data into CSV files (stamped with the date and time in the name,ie CSVFileName_2021-01-30 12:00:00.csv) and need to then read these CSV files into my next program. How can I read them in using only the fixed file name while ignoring the date/time? Ideally it would always pick the file with the most recent time stamp since a user could have run the program multiple times in a day and thus have multiple files with the same partial name but different times or dates.

Any advice would be very appreciated!

Answer 1

You can use list.files to get all the files matching that name pattern, sort them (well, they come back already sorted alphabetically), and pick the most recent. Something like this:

most_recent = function(...) {
 tail(list.files(...), 1)
}

read.csv(most_recent(pattern = "CSVFileName"))

Answer 2

If in case you have different users who produced different filenames, you can do something like this to get the recent file for all users

library(tidyverse)

files <- c("CSVFileName1_2021-01-30 12:00:00.csv", "CSVFileName1_2021-01-30 11:00:00.csv",
           "CSVFileName1_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 12:00:00.csv",
           "CSVFileName2_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 09:00:00.csv",
           "CSVFileName2_2021-01-30 12:00:00.csv", "CSVFileName3_2021-01-30 11:00:00.csv", 
           "CSVFileName3_2021-01-30 12:00:00.csv", "CSVFileName4_2021-01-30 12:00:00.csv")

files %>% 
  enframe(name = NULL) %>% 
  mutate(file = value) %>% 
  separate(value, into = c("name", "time"), sep = c("_")) %>% 
  mutate(time = time %>% str_remove(".csv") %>% lubridate::as_datetime()) %>% 
  group_by(name) %>% 
  arrange(desc(time)) %>% 
  slice(1) %>% 
  pull(file)
#> [1] "CSVFileName1_2021-01-30 12:00:00.csv"
#> [2] "CSVFileName2_2021-01-30 12:00:00.csv"
#> [3] "CSVFileName3_2021-01-30 12:00:00.csv"
#> [4] "CSVFileName4_2021-01-30 12:00:00.csv"

^{Created on 2021-02-02 by the reprex package (v0.3.0)}

Answer 3

M. Yates,

Here is some code that will find all the csv-files in a working directory that match a defined pattern (eg "my_file*.csv") using the glob2rx function and then reads in the latest csv-file based on that file's 'last modified' time.

# Load library
library('tidyverse')

# Locate files in working directory
files <- data.frame('files'=dir(pattern = glob2rx("my_file*.csv")), 
                   stringsAsFactors = FALSE)
files$modified_time <- file.mtime(files$files)

# Arrange the 'files' dataframe by the 'modified_time' column in descending order
files <- files %>% arrange(desc(modified_time))

# Read in the latest file which will be in the first row of the 'files' dataframe
my_df <- read_csv(file=files$file[1])

How to read a CSV file into R using partial file name?

Question

3 answers

solution1
1 2021-02-02 14:42:42

solution2
0 2021-02-02 15:37:22

solution3
0 2021-02-02 17:44:53

How to read a CSV file into R using partial file name?

Question

3 answers

solution1 1 2021-02-02 14:42:42

solution2 0 2021-02-02 15:37:22

solution3 0 2021-02-02 17:44:53

solution1
1 2021-02-02 14:42:42

solution2
0 2021-02-02 15:37:22

solution3
0 2021-02-02 17:44:53