I need to a list of the unique subject IDs (the part before _ and after /) from the contents of a folder below.
[1] "." "./4101_0" "./4101_0/4101 Baseline"
[4] "./4101_1" "./4101_2" "./4101_2_2"
[7] "./4101_3" "./4101_4" "./4101_5"
[10] "./4101_6"
Right now I'm doing this (using the packages stringr and foreach).
# Create list of contents
Folder.list <- list.dirs()
# Split entries by the "/"
SubIDs <- str_split(Folder.list, "/")
# For each entry in the list, retrieve the second element
SubIDs <- unlist(foreach(i=1:length(SubIDs)) %do% SubIDs[[i]][2])
# Split entries by the "_"
SubIDs <- str_split(SubIDs, "_")
# Take the second element after splitting, unlist it, find the unique entries, remove the NA and coerce to numeric
SubIDs <- as.numeric(na.omit(unique(unlist(foreach(i=1:length(SubIDs)) %do% SubIDs[[i]][1]))))
This does the job but seems unnecessarily horrible. What's a cleaner way of getting from point A to point B?
Use q regular expression.
x <- c(".", "./4101_0", "./4101_0/4101 Baseline", "./4101_1", "./4101_2", "./4101_2_2", "./4101_3", "./4101_4", "./4101_5", "./4101_6")
One way of using a regular expression is to use gsub()
to extract the subject code
gsub(".*/(\\d+)_.*", "\\1", x)
[1] "." "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"
stringr
also has the str_extract
function, which can be used to extract substrings that match a regex pattern. With a positive lookbehind for /
and a positive lookahead for _
, you can achieve your aim.
Beginning with @Andrie's x
:
str_extract(x, perl('(?<=/)\\d+(?=_)'))
# [1] NA "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101" "4101"
The pattern above matches one or more numerals (ie \\\\d+
) that are preceded by a forward slash and followed by an underscore. Wrapping the pattern in perl()
is required for the lookarounds.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.