簡體   English   中英

過濾包含和不包含多個字符串的向量元素

[英]Filter vector elements containing and not containing multiple strings

根據此鏈接中的代碼,我們可以找到包含多個字符串的文件名:

allpatterns <- function(fnames, patterns) {
  i <- sapply(fnames, function(fn) all(sapply(patterns, grepl, fn)) )
  fnames[i]
}

filenames <- c("foo.txt", "bar.R", "foo_quux.py", "quux.c", "quux.foo",
               "foo_bar", "bar.foo.cpp", "foo_bar_quux", "quux_foo.bar", "nothing")

allpatterns(filenames, c("foo", "bar"))
# [1] "foo_bar"      "bar.foo.cpp"  "foo_bar_quux" "quux_foo.bar"

現在我想通過添加一個不包含某些字符串的條件來進一步 go ,例如我希望過濾包含foobar並且不包含cppquux文件名,這將給出以下結果:

 # [1] "foo_bar"

我怎樣才能通過修改上面的代碼來實現呢?

編輯:下面專門針對 R 大師的回答,即使我沒有得到確切的預期結果,這也很鼓舞人心:

filenames <- c("foo.txt", "bar.R", "foo_quux.py", "quux.c", "quux.foo",
               "foo_bar", "bar.foo.cpp", "foo_bar_quux", "quux_foo.bar",
               "nothing")
keep <- c("foo", "bar")
drop <- c("cpp", "quux")

paste0('', paste0(keep, collapse = ''))
keep_regex <- paste0("\\b(?:", paste(keep, collapse="|"), ")\\b")
drop_regex <- paste0("\\b(?:", paste(drop, collapse="|"), ")\\b")

result <- filenames[grepl(keep_regex, filenames) &
                      !grepl(drop_regex, filenames)]
result

沒有“cpp”和“quux”的“foo”或“bar”:

filenames[grepl("foo|bar",filenames)&!grepl("cpp|quux",filenames)]
[1] "foo.txt" "bar.R"   "foo_bar"

沒有“cpp”和“quux”的“foo”和“bar”:

filenames[grepl("(?=.*foo)(?=.*bar)",filenames,perl = T)&!grepl("cpp|quux",filenames)]
[1] "foo_bar"

也許這個 function 會有所幫助:

allpatterns <- function(fnames, keep, remove) {
  # Include if it contains all the `keep` variables
  i <- Reduce(`&`, lapply(keep, function(x) grepl(x, fnames)))
  # Drop if any of `remove` variable is present. 
  j <- !Reduce(`|`, lapply(remove, function(x) grepl(x, fnames)))
  fnames[i & j]
}

allpatterns(filenames, c("foo", "bar"), c("cpp", "quux"))
#[1] "foo_bar"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM