简体   繁体   English

什么正则表达式可以在一个短语中捕获2个确切的“单词”?

[英]What regex can capture 2 exact 'words' in a phrase?

I'm trying to capture words constant in a string. 我正在尝试捕获字符串中恒定的单词。 That constant is: 该常数是:

  1. a word 一个字
  2. followed by one separator (whitespace, dot, dash or underscore) 后跟一个分隔符(空格,点,破折号或下划线)
  3. another word 另一个词
  4. and then a separator (see #2) or the end of the line or string. 然后是分隔符(请参阅#2)或行或字符串的结尾。

For the sake example let's say I'm looking for "Bob 1", in the following strings: 举个例子,假设我在以下字符串中寻找“ Bob 1”:

Hello, I'm Bob 1 --> Should capture Bob 1
Hello, I'm Bob 11 --> Should capture nothing (Bob 1 is not at the end or followed by a separator)
Hey, it's Bob-1 over there --> Should capture Bob-1
Hey, it's Bob - 1 over there --> Should capture nothing (Bob should be followed only by one separator not 3 like here)
Bob.1 --> Should capture Bob.1
Bob_1 rules! --> Should capture Bob_1

I have a regex that mostly works: 我有一个最有效的正则表达式:

/Bob[\s._-]1[\s._-]/ig

In the second list I don't know how to add the end of the string in the possible characters... Which ends in only the last line in the live demo below that should be a match and that isn't captured. 在第二个列表中,我不知道如何在可能的字符中添加字符串的结尾...该结尾仅位于下面的实时演示的最后一行中,这应该是一个匹配项,并且不会被捕获。

See the live demo . 观看现场演示

I use pcre (in PHP). 我使用pcre(在PHP中)。

I'm not using PHP, but the following matches for me: 我没有使用PHP,但是以下匹配项适合我:

\bBob[\s.\-_]1\b

It is making use of \\b which matches against a word boundary. 它利用\\b匹配单词边界。 I found that I had to escape the dash inside the square brackets, which isn't something you are doing but that may be a difference between the regex engines we are using. 我发现我必须将方括号内的破折号转义,这不是您要执行的操作,但这可能是我们使用的regex引擎之间的区别。

This works 这有效

https://regex101.com/r/ezikuP/2 https://regex101.com/r/ezikuP/2

(?<!\\S)Bob[\\s._-]1(?![^\\s._-])

Formatted 格式化

 (?<! \S )               # Whitespace boundary
 Bob                     # Word 1
 [\s._-]                 # Special seperator
 1                       # Word 2
 (?! [^\s._-] )          # Special seperator boundary

Which ends in only the last line in the live demo below that should be a match and that isn't captured. 仅在下面的实时演示中的最后一行结束,这应该是一个匹配项,并且不会被捕获。

For that you need a positive lookahead. 为此,您需要积极向前。

Regex: Bob[\\s._-]1(?=[\\s._-]) 正则表达式: Bob[\\s._-]1(?=[\\s._-])

  • (?=[\\s._-]) will only look for give character class and won't match/capture it. (?=[\\s._-])将只查找给定字符类,而不会匹配/捕获它。

Regex101 Demo Regex101演示

In the second list I don't know how to add the end of the string in the possible characters. 在第二个列表中,我不知道如何在可能的字符中添加字符串的结尾。

You can use this regex with anchor $ to assert end of string: 您可以将此正则表达式与锚点$一起使用来声明字符串的结尾:

/\bBob[\s._-]1(?:[\s._-]|$)/m

OR if you don't want to match next character after 2nd word then use a lookahead: 或者,如果您不想在第二个单词之后匹配下一个字符,请使用前瞻:

/\bBob[\s._-]1(?=[\s._-]|$)/m

([\\s._-]|$) will assert presence of given (one of whitespace, DOT, Underscore, Hyphen) characters or end of line $ . ([\\s._-]|$)将断言给定字符(空格,DOT,下划线,连字符之一)或行$

It is safer to add \\b before Bob to match exact word Bob and avoid matching HelloBob Bob之前添加\\b以匹配确切的单词Bob并避免匹配HelloBob是更安全的

RegEx Demo 正则演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM