简体   繁体   中英

Regex, how to match everything up to nth occurrence

I'm trying to get everything from a webpage up until the second occurrence of a word matchdate .

(.*?matchdate){2} is what I'm trying but that's not doing that trick. The page has 14+ matches of "matchdate" and I only want to get everything up to the second one, and then nothing else.

https://regex101.com/r/Cjyo0f/1 <--- my saved regex.

What am I missing here?

Thanks.

There are a couple ways you can do this:

If you can, remove the g flag

Without the global flag, regex will only grab the first instance it encounters.

https://regex101.com/r/Cjyo0f/2

Add a ^ to the front of the regex

A caret will force the regex to match from the beginning of the string, ruling out all other possibilities.

https://regex101.com/r/Cjyo0f/3

If Python is available, use .split() and .join()

If regular python is available, I would recommend:

string = "I like to matchdate, I want to each matchdate for breakfest"
print "matchdate".join(string.split("matchdate")[:2])

You almost had it! (.*?matchdate){2} was actually correct. It just needs a re.DOTALL flag so that the dot matches newlines as well as other characters.

Here is a working test:

>>> import re

>>> s = '''First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate and other
stuff you're
not interested in
like another matchdate
or a matchdate redux.
'''

>>> print(re.search('(.*?matchdate){2}', s, re.DOTALL).group())
First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM