![](/img/trans.png)
[英]How to import csv file with column names as identifier, not file name in R
[英]How to match file names in directory on R with names in CSV column
我正在嘗試編寫一個 r 腳本,該腳本將匹配目錄中的文件名並將其與位於 csv 文件中的文件名進行比較。 這樣我就可以知道已經下載了哪些文件以及需要下載哪些數據。 我編寫的代碼將從目錄中讀取文件並將它們列為 df 以及讀取 csv 文件。 但是,我無法更改文件名以提取我想要的字符串以及將文件名與 csv 文件中的名稱列匹配。 理想情況下,我還希望創建一個新的電子表格,它可以告訴我哪些文件匹配,這樣我就知道下載了什么。 這就是我到目前為止所擁有的。
# read files from directory and list as df
file_names <-list.files(path="KOMP/",
pattern="nrrd",
all.files=TRUE,
full.names=TRUE,
recursive=TRUE) %>%
# turn into df
as.data.frame(x = file_names)
# read in xl file
name_data <- read_excel("KOMP/all_data.xlsx")
# change the file_name from the string KOMP//icbm/agtc1/12dsfs.nrrd.txt to -> 12dsfs
# match the file name with the name column in name_data
# create a new spread sheet that pulls the id and row if it has been downloaded [enter image description here][1]
讓我們創建一個包含一些示例文件的示例目錄。 這將使我們證明該解決方案有效,並且是可重現解決方案的關鍵。
library(dplyr)
library(writexl)
library(readxl)
# Example directory with example files
dir.create(path = "KOMP")
write.csv(data.frame(x = 5), file = "KOMP/foo.csv")
write.csv(data.frame(x = 20), file = "KOMP/foo.nrrd.csv")
write.csv(data.frame(x = 1), file = "KOMP/foo2.nrrd.csv")
write.csv(data.frame(z = 2), file = "KOMP/bar.csv")
write.csv(data.frame(z = 5), file = "KOMP/bar.rrdr.csv")
# Example Excel file
write_xlsx(data.frame(name = c("foo", "hotdog")),
path = "KOMP/all_data.xlsx")
我們現在可以使用我們的示例文件和目錄來展示問題的解決方案。
# Get file paths in a data.frame for those that contain ".nrrd"
# Use data.frame() to avoid row names instead of as.data.frame()
# Need to use \\ to escape the period in the regular expression
file_names <- list.files(
path = "KOMP/",
pattern = "\\.nrrd",
all.files = TRUE,
full.names = TRUE,
recursive = TRUE
) %>%
data.frame(paths = .)
# Extract part of file name (i.e. removing directory substrings) that
# comes before .nrrd and add a column. Can get file name with basename()
# and use regular expressions for the other part.
file_names$match_string <- file_names %>%
pull(paths) %>%
basename() %>%
gsub(pattern = "\\.nrrd.*", replacement = "")
file_names$match_string
#> [1] "foo" "foo2"
# Read in excel file with file names to match (if possible)
name_data <- read_excel("KOMP/all_data.xlsx")
name_data$name
#> [1] "foo" "hotdog"
# Create match indicator and row number
name_data <- name_data %>%
mutate(
matched = case_when(name %in% file_names$match_string ~ 1,
TRUE ~ 0),
rowID = row_number()
)
# Create excel spreadsheet of files already downloaded
name_data %>%
filter(matched == 1) %>%
write_xlsx(path = "KOMP/already_downloaded.xlsx")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.