简体   繁体   English

正则表达式模式以匹配子字符串

[英]Regex pattern to match substring

Would like to find the following pattern in a string: 想在字符串中找到以下模式:

word-word-word++ or -word-word-word++ word-word-word++-word-word-word++

So that it iterates the -word or word- pattern until the end of the substring. 以便迭代-wordword-模式直到子字符串的结尾。

the string is quite large and contains many words with those^ patterns. 字符串很大,包含许多带有那些^模式的单词。 The following has been tried: 已尝试以下方法:

p = re.compile('(?:\w+\-)*\w+\s+=', re.IGNORECASE)
result = p.match(data)

but it returns NONE. 但它返回NONE。 Does anyone know the answer? 有人知道答案吗?

Your regex will only match the first pattern, match() will only find one occurrence, and that only if it is immediately followed by some whitespace and an equals sign. 您的正则表达式将仅匹配第一个模式,match()仅会发现一个匹配项,并且仅在其后紧跟一些空格和等号。

Also, in your example you implied you wanted three or more words, so here's a version that was changed in the following ways: 另外,在您的示例中,您暗示您想要三个或三个以上的单词,因此,此版本已通过以下方式进行了更改:

  1. match both patterns (note the leading -? ) 匹配两种模式(注意开头的-?
  2. match only if there are at least three words to the pattern ( {2,} instead of + ) 仅在模式中至少包含三个单词( {2,}而不是+ )时匹配
  3. match even if there's nothing after the pattern (the \\b matches a word boundary. It is not really necessary here, since the preceding \\w+ guarantees we are at a word boundary anyway) 即使在模式之后没有任何内容也要匹配( \\b匹配单词边界。在这里并没有必要,因为前面的\\w+保证我们始终处于单词边界)
  4. returns all matches instead of only the first one. 返回所有匹配项,而不只是第一个。

Here's the code: 这是代码:

#!/usr/bin/python

import re

data=r"foo-bar-baz not-this -this-neither nope double-dash--so-nope -yeah-this-even-at-end-of-string"
p = re.compile(r'-?(?:\w+-){2,}\w+\b', re.IGNORECASE)
print p.findall(data)
# prints ['foo-bar-baz', '-yeah-this-even-at-end-of-string'] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM