简体   繁体   English

如何使用部分文件名将 CSV 文件读入 R?

[英]How to read a CSV file into R using partial file name?

I have a program that outputs data into CSV files (stamped with the date and time in the name,ie CSVFileName_2021-01-30 12:00:00.csv) and need to then read these CSV files into my next program.我有一个程序可以将数据输出到 CSV 文件(名称中标有日期和时间,即 CSVFileName_2021-01-30 12:00:00.csv),然后需要将这些 CSV 文件读入我的下一个程序。 How can I read them in using only the fixed file name while ignoring the date/time?如何在忽略日期/时间的情况下仅使用固定文件名读取它们? Ideally it would always pick the file with the most recent time stamp since a user could have run the program multiple times in a day and thus have multiple files with the same partial name but different times or dates.理想情况下,它总是会选择具有最新时间戳的文件,因为用户可能在一天内多次运行该程序,因此有多个文件具有相同的部分名称但不同的时间或日期。

Any advice would be very appreciated!任何建议将不胜感激!

You can use list.files to get all the files matching that name pattern, sort them (well, they come back already sorted alphabetically), and pick the most recent.您可以使用list.files获取与该名称模式匹配的所有文件,对它们进行排序(好吧,它们返回时已经按字母顺序排序),然后选择最新的。 Something like this:像这样的东西:

most_recent = function(...) {
 tail(list.files(...), 1)
}

read.csv(most_recent(pattern = "CSVFileName"))

If in case you have different users who produced different filenames, you can do something like this to get the recent file for all users如果您有不同的用户生成了不同的文件名,您可以执行类似的操作来获取所有用户的最新文件

library(tidyverse)

files <- c("CSVFileName1_2021-01-30 12:00:00.csv", "CSVFileName1_2021-01-30 11:00:00.csv",
           "CSVFileName1_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 12:00:00.csv",
           "CSVFileName2_2021-01-30 10:00:00.csv", "CSVFileName2_2021-01-30 09:00:00.csv",
           "CSVFileName2_2021-01-30 12:00:00.csv", "CSVFileName3_2021-01-30 11:00:00.csv", 
           "CSVFileName3_2021-01-30 12:00:00.csv", "CSVFileName4_2021-01-30 12:00:00.csv")

files %>% 
  enframe(name = NULL) %>% 
  mutate(file = value) %>% 
  separate(value, into = c("name", "time"), sep = c("_")) %>% 
  mutate(time = time %>% str_remove(".csv") %>% lubridate::as_datetime()) %>% 
  group_by(name) %>% 
  arrange(desc(time)) %>% 
  slice(1) %>% 
  pull(file)
#> [1] "CSVFileName1_2021-01-30 12:00:00.csv"
#> [2] "CSVFileName2_2021-01-30 12:00:00.csv"
#> [3] "CSVFileName3_2021-01-30 12:00:00.csv"
#> [4] "CSVFileName4_2021-01-30 12:00:00.csv"

Created on 2021-02-02 by the reprex package (v0.3.0)reprex package (v0.3.0) 于 2021 年 2 月 2 日创建

M. Yates, M.耶茨,

Here is some code that will find all the csv-files in a working directory that match a defined pattern (eg "my_file*.csv") using the glob2rx function and then reads in the latest csv-file based on that file's 'last modified' time.下面是一些代码,它将使用glob2rx function 在工作目录中找到与定义的模式(例如“my_file*.csv”)匹配的所有 csv 文件,然后根据该文件的“最后修改”读取最新的 csv 文件' 时间。

# Load library
library('tidyverse')

# Locate files in working directory
files <- data.frame('files'=dir(pattern = glob2rx("my_file*.csv")), 
                   stringsAsFactors = FALSE)
files$modified_time <- file.mtime(files$files)

# Arrange the 'files' dataframe by the 'modified_time' column in descending order
files <- files %>% arrange(desc(modified_time))

# Read in the latest file which will be in the first row of the 'files' dataframe
my_df <- read_csv(file=files$file[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM