简体   繁体   English

R 使用 grepl 匹配多词字符串

[英]R matching multi-word strings using grepl

I'm sure the answer to this probably exists out there somewhere, but I'm struggling to find it.我确信这个问题的答案可能存在于某个地方,但我正在努力寻找它。

I have a set of strings that are retail outlet names and I am trying to identify any that match certain patterns我有一组作为零售店名称的字符串,我正在尝试识别任何与某些模式匹配的字符串

With the below code I can relatively easily subset where the single name of outlet 1 ["example"] appears, but am unable to 'match' outlet 2, which consists of two words separated by a space ["example two"].....使用下面的代码,我可以相对容易地对插座 1 ["example"] 的单个名称出现的位置进行子集化,但无法“匹配”插座 2,它由两个单词组成,由空格分隔 ["example two"].. ...

des <- subset(poiSep06,
              grepl("example|example two", tolower(poiSep06$NAME)) == TRUE)

The above brings back all records where the name includes "example", but does not bring back anything for "example two".以上带回了名称中包含“示例”的所有记录,但没有带回“示例二”的任何内容。

Can anyone advise/point me in the direction of something to show me what I'm doing wrong [I'm sure it's pretty simple].任何人都可以建议/指出一些事情的方向,以向我展示我做错了什么[我敢肯定这很简单]。 I'm reluctant to manipulate the original NAME values too much as it might lead to erroneous outlets being included.我不愿意过多地操纵原始 NAME 值,因为它可能会导致包含错误的网点。

Supposing you have a character vector like string below, here's how you would get different results depending upon what you're looking for:假设您有一个像下面的string这样的字符向量,根据您要查找的内容,您将获得不同的结果:

# An example character vector
string <- c("example","example two", "foo 2", "foo 6", "cat dog")

# Any element with "example" in it
string[grepl("example", string)]
#> [1] "example"     "example two"

# Any element with "example" OR "foo" in it
# two equivalent approaches
string[grepl("example", string) | grepl("foo", string)]
#> [1] "example"     "example two" "foo 2"       "foo 6"
string[grepl("example|foo", string)]
#> [1] "example"     "example two" "foo 2"       "foo 6"

# Match any substring in tomatch
tomatch <- c("example","cat")
string[grepl(paste(tomatch, collapse = "|"), string)]
#> [1] "example"     "example two" "cat dog"

# Any element that is EXACTLY "example" or "example two"
# two equivalent approaches
string[string %in% c("example","example two")]
#> [1] "example"     "example two"
string[(string == "example") | (string == "example two")]
#> [1] "example"     "example two"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM