简体   繁体   English

使用变量在 R 中创建正则表达式模式

[英]Using variable to create regular expression pattern in R

I have a function:我有一个功能:

ncount <- function(num = NULL) {

 toRead <- readLines("abc.txt")
 n <- as.character(num)
 x <- grep("{"n"} number",toRead,value=TRUE)

}

While grep-ing, I want the num passed in the function to dynamically create the pattern to be searched? grep-ing 时,我希望函数中传递的 num 动态创建要搜索的模式? How can this be done in R?这如何在 R 中完成? The text file has number and text in every line文本文件的每一行都有数字和文本

您可以使用paste连接字符串:

grep(paste("{", n, "} number", sep = ""),homicides,value=TRUE)

In order to build a regular expression from variables in R, in the current scenarion, you may simply concatenate string literals with your variable using paste0 :为了从 R 中的变量构建正则表达式,在当前场景中,您可以简单地使用paste0字符串文字与您的变量连接paste0

grep(paste0('\\{', n, '} number'), homicides, value=TRUE)

Note that { is a special character outside a [...] bracket expression (also called character class ), and should be escaped if you need to find a literal { char.请注意, {[...]括号表达式(也称为字符类)之外的特殊字符,如果您需要查找文字{ char.

In case you use a list of items as an alternative list , you may use a combination of paste / paste0 :如果您使用项目列表作为替代列表,您可以使用paste / paste0组合

words <- c('bananas', 'mangoes', 'plums')
regex <- paste0('Ben likes (', paste(words, collapse='|'), ')\\.')

The resulting Ben likes (bananas|mangoes|plums)\\.由此产生的Ben likes (bananas|mangoes|plums)\\. regex will match Ben likes bananas.正则表达式匹配Ben likes bananas. , Ben likes mangoes. Ben likes mangoes. or Ben likes plums.或者Ben likes plums. . . See the R demo and the regex demo .请参阅[R演示正则表达式演示

NOTE : PCRE (when you pass perl=TRUE to base R regex functions) or ICU ( stringr / stringi regex functions) have proved to better handle these scenarios, it is recommended to use those engines rather than the default TRE regex library used in base R regex functions.注意:PCRE(当您将perl=TRUE传递给 base R regex 函数时)或 ICU( stringr / stringi regex 函数)已证明可以更好地处理这些情况,建议使用这些引擎而不是 base 中使用的默认 TRE regex 库R 正则表达式函数。

Oftentimes, you will want to build a pattern with a list of words that should be matched exactly, as whole words.通常,您会希望构建一个包含应该完全匹配的单词列表的模式,作为整个单词。 Here, a lot will depend on the type of boundaries and whether the words can contain special regex metacharacters or not, whether they can contain whitespace or not.在这里,很大程度上取决于边界的类型以及单词是否可以包含特殊的正则表达式元字符,它们是否可以包含空格。

In the most general case, word boundaries ( \\b ) work well.在最一般的情况下,单词边界 ( \\b )效果很好。

regex <- paste0('\\b(', paste(words, collapse='|'), ')\\b')
unlist(regmatches(examples, gregexpr(regex, examples, perl=TRUE)))
## => [1] "bananas" "mangoes" "plums"  

The \\b(bananas|mangoes|plums)\\b pattern will match bananas , but won't match banana (see an R demo ). \\b(bananas|mangoes|plums)\\b模式将匹配bananas ,但不会匹配banana (参见R 演示)。

If your list is like如果您的列表是像

words <- c('cm+km', 'uname\\vname')

you will have to escape the words first, ie append \\ before each of the metacharacter:您必须首先对单词进行转义,即在每个元字符之前附加\\

regex.escape <- function(string) {
  gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
examples <- c('Text: cm+km, and some uname\\vname?')
words <- c('cm+km', 'uname\\vname')
regex <- paste0('\\b(', paste(regex.escape(words), collapse='|'), ')\\b')
cat( unlist(regmatches(examples, gregexpr(regex, examples, perl=TRUE))) )
## => cm+km uname\vname 

If your words can start or end with a special regex metacharacter, \\b word boundaries won't work.如果您的单词可以以特殊的正则表达式元字符开头或结尾,则\\b单词边界将不起作用。 Use采用

  • Unambiguous word boundaries , (?<!\\w) / (?!\\w) , when the match is expected between non-word chars or start/end of string明确的词边界(?<!\\w) / (?!\\w) ,当在非单词字符或字符串的开始/结束之间预期匹配时
  • Whitespace boundaries , (?<!\\S) / (?!\\S) , when the match is expected to be enclosed with whitespace chars, or start/end of string空白边界(?<!\\S) / (?!\\S) ,当匹配需要用空白字符或字符串的开头/结尾括起来时
  • Build your own using the lookbehind/lookahead combination and your custom character class / bracket expression, or even more sophisticad patterns.使用后视/前瞻组合和自定义字符类/括号表达式,甚至更复杂的模式构建您自己的。

Example of the first two approaches in R (replacing with the match enclosed with << and >> ): R 中前两种方法的示例(替换为用<<>>括起来的匹配项):

regex.escape <- function(string) {
  gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
examples <- 'Text: cm+km, +km and C++,Delphi,C++CLI and C++/CLI.'
words <- c('+km', 'C++')
# Unambiguous word boundaries
regex <- paste0('(?<!\\w)(', paste(regex.escape(words), collapse='|'), ')(?!\\w)')
gsub(regex, "<<\\1>>", examples, perl=TRUE)
# => [1] "Text: cm+km, <<+km>> and <<C++>>,Delphi,C++CLI and <<C++>>/CLI."
# Whitespace boundaries
regex <- paste0('(?<!\\S)(', paste(regex.escape(words), collapse='|'), ')(?!\\S)')
gsub(regex, "<<\\1>>", examples, perl=TRUE)
# => [1] "Text: cm+km, <<+km>> and C++,Delphi,C++CLI and C++/CLI."

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM