如何将 R 目录中的文件名与 CSV 列中的名称匹配

Question

I am trying to write an r script that will match the file name inside a directory and compare it to a file name located in a csv file.我正在尝试编写一个 r 脚本，该脚本将匹配目录中的文件名并将其与位于 csv 文件中的文件名进行比较。 This is so I can tell what files have already been downloaded and what data I need to download.这样我就可以知道已经下载了哪些文件以及需要下载哪些数据。 I have written code that will read the files from the directory and list them as a df as well as reading in the csv file.我编写的代码将从目录中读取文件并将它们列为 df 以及读取 csv 文件。 However I am having trouble changing the file name to pull out the string I want as well as matching the file name with the name column in the csv file.但是，我无法更改文件名以提取我想要的字符串以及将文件名与 csv 文件中的名称列匹配。 I also would want to ideally create a new spread sheet that can tell me what files match so I know what has been downloaded.理想情况下，我还希望创建一个新的电子表格，它可以告诉我哪些文件匹配，这样我就知道下载了什么。 This is what I have so far.这就是我到目前为止所拥有的。

# read files from directory and list as df
file_names <-list.files(path="KOMP/", 
                        pattern="nrrd",
                        all.files=TRUE,
                        full.names=TRUE,
                        recursive=TRUE) %>%
# turn into df
as.data.frame(x = file_names)

# read in xl file 
name_data <- read_excel("KOMP/all_data.xlsx")

# change the file_name from the string KOMP//icbm/agtc1/12dsfs.nrrd.txt  to -> 12dsfs
# match the file name with the name column in name_data
# create a new spread sheet that pulls the id and row if it has been downloaded [enter image description here][1]

Answer 1

Example files/directory示例文件/目录

Let's create an example directory with some example files.让我们创建一个包含一些示例文件的示例目录。 This will let us prove that the solution works and is key to a reproducible solution.这将使我们证明该解决方案有效，并且是可重现解决方案的关键。

library(dplyr)
library(writexl)
library(readxl)

# Example directory with example files
dir.create(path = "KOMP")
write.csv(data.frame(x = 5), file = "KOMP/foo.csv")
write.csv(data.frame(x = 20), file = "KOMP/foo.nrrd.csv")
write.csv(data.frame(x = 1), file = "KOMP/foo2.nrrd.csv")
write.csv(data.frame(z = 2), file = "KOMP/bar.csv")
write.csv(data.frame(z = 5), file = "KOMP/bar.rrdr.csv")

# Example Excel file
write_xlsx(data.frame(name = c("foo", "hotdog")),
           path = "KOMP/all_data.xlsx")

Solution解决方案

We can now use our example files and directory to show a solution to the problem.我们现在可以使用我们的示例文件和目录来展示问题的解决方案。

# Get file paths in a data.frame for those that contain ".nrrd"
# Use data.frame() to avoid row names instead of as.data.frame()
# Need to use \\ to escape the period in the regular expression
file_names <- list.files(
  path = "KOMP/",
  pattern = "\\.nrrd",
  all.files = TRUE,
  full.names = TRUE,
  recursive = TRUE
) %>%
  data.frame(paths = .)

# Extract part of file name (i.e. removing directory substrings) that
# comes before .nrrd and add a column. Can get file name with basename()
# and use regular expressions for the other part.
file_names$match_string <- file_names %>%
  pull(paths) %>%
  basename() %>%
  gsub(pattern = "\\.nrrd.*", replacement = "")

file_names$match_string
#> [1] "foo"  "foo2"

# Read in excel file with file names to match (if possible)
name_data <- read_excel("KOMP/all_data.xlsx")

name_data$name
#> [1] "foo"    "hotdog"

# Create match indicator and row number
name_data <- name_data %>%
  mutate(
    matched = case_when(name %in% file_names$match_string ~ 1,
                        TRUE ~ 0),
    rowID = row_number()
  )

# Create excel spreadsheet of files already downloaded
name_data %>%
  filter(matched == 1) %>%
  write_xlsx(path = "KOMP/already_downloaded.xlsx")

如何将 R 目录中的文件名与 CSV 列中的名称匹配

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-31 02:57:27

Example files/directory示例文件/目录

Solution解决方案

如何将 R 目录中的文件名与 CSV 列中的名称匹配

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-31 02:57:27

Example files/directory示例文件/目录

Solution解决方案

解决方案1
1 已采纳 2022-07-31 02:57:27