简体   繁体   中英

Regex pattern to match substring

Would like to find the following pattern in a string:

word-word-word++ or -word-word-word++

So that it iterates the -word or word- pattern until the end of the substring.

the string is quite large and contains many words with those^ patterns. The following has been tried:

p = re.compile('(?:\w+\-)*\w+\s+=', re.IGNORECASE)
result = p.match(data)

but it returns NONE. Does anyone know the answer?

Your regex will only match the first pattern, match() will only find one occurrence, and that only if it is immediately followed by some whitespace and an equals sign.

Also, in your example you implied you wanted three or more words, so here's a version that was changed in the following ways:

  1. match both patterns (note the leading -? )
  2. match only if there are at least three words to the pattern ( {2,} instead of + )
  3. match even if there's nothing after the pattern (the \\b matches a word boundary. It is not really necessary here, since the preceding \\w+ guarantees we are at a word boundary anyway)
  4. returns all matches instead of only the first one.

Here's the code:

#!/usr/bin/python

import re

data=r"foo-bar-baz not-this -this-neither nope double-dash--so-nope -yeah-this-even-at-end-of-string"
p = re.compile(r'-?(?:\w+-){2,}\w+\b', re.IGNORECASE)
print p.findall(data)
# prints ['foo-bar-baz', '-yeah-this-even-at-end-of-string'] 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM