简体   繁体   中英

How to read a CSV file into R using partial file name?

I have a program that outputs data into CSV files (stamped with the date and time in the name,ie CSVFileName_2021-01-30 12:00:00.csv) and need to then read these CSV files into my next program. How can I read them in using only the fixed file name while ignoring the date/time? Ideally it would always pick the file with the most recent time stamp since a user could have run the program multiple times in a day and thus have multiple files with the same partial name but different times or dates.

Any advice would be very appreciated!

You can use list.files to get all the files matching that name pattern, sort them (well, they come back already sorted alphabetically), and pick the most recent. Something like this:

most_recent = function(...) {
 tail(list.files(...), 1)
}

read.csv(most_recent(pattern = "CSVFileName"))

If in case you have different users who produced different filenames, you can do something like this to get the recent file for all users

library(tidyverse)

files <- c("CSVFileName1_2021-01-30 12:00:00.csv", "CSVFileName1_2021-01-30 11:00:00.csv",
           "CSVFileName1_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 12:00:00.csv",
           "CSVFileName2_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 09:00:00.csv",
           "CSVFileName2_2021-01-30 12:00:00.csv", "CSVFileName3_2021-01-30 11:00:00.csv", 
           "CSVFileName3_2021-01-30 12:00:00.csv", "CSVFileName4_2021-01-30 12:00:00.csv")

files %>% 
  enframe(name = NULL) %>% 
  mutate(file = value) %>% 
  separate(value, into = c("name", "time"), sep = c("_")) %>% 
  mutate(time = time %>% str_remove(".csv") %>% lubridate::as_datetime()) %>% 
  group_by(name) %>% 
  arrange(desc(time)) %>% 
  slice(1) %>% 
  pull(file)
#> [1] "CSVFileName1_2021-01-30 12:00:00.csv"
#> [2] "CSVFileName2_2021-01-30 12:00:00.csv"
#> [3] "CSVFileName3_2021-01-30 12:00:00.csv"
#> [4] "CSVFileName4_2021-01-30 12:00:00.csv"

Created on 2021-02-02 by the reprex package (v0.3.0)

M. Yates,

Here is some code that will find all the csv-files in a working directory that match a defined pattern (eg "my_file*.csv") using the glob2rx function and then reads in the latest csv-file based on that file's 'last modified' time.

# Load library
library('tidyverse')

# Locate files in working directory
files <- data.frame('files'=dir(pattern = glob2rx("my_file*.csv")), 
                   stringsAsFactors = FALSE)
files$modified_time <- file.mtime(files$files)

# Arrange the 'files' dataframe by the 'modified_time' column in descending order
files <- files %>% arrange(desc(modified_time))

# Read in the latest file which will be in the first row of the 'files' dataframe
my_df <- read_csv(file=files$file[1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM