Regex, how to match everything up to nth occurrence

Question

I'm trying to get everything from a webpage up until the second occurrence of a word matchdate .

(.*?matchdate){2} is what I'm trying but that's not doing that trick. The page has 14+ matches of "matchdate" and I only want to get everything up to the second one, and then nothing else.

https://regex101.com/r/Cjyo0f/1 <--- my saved regex.

What am I missing here?

Thanks.

Answer 1

There are a couple ways you can do this:

If you can, remove the `g` flag

Without the global flag, regex will only grab the first instance it encounters.

https://regex101.com/r/Cjyo0f/2

Add a `^` to the front of the regex

A caret will force the regex to match from the beginning of the string, ruling out all other possibilities.

https://regex101.com/r/Cjyo0f/3

If Python is available, use `.split()` and `.join()`

If regular python is available, I would recommend:

string = "I like to matchdate, I want to each matchdate for breakfest"
print "matchdate".join(string.split("matchdate")[:2])

Answer 2

You almost had it! (.*?matchdate){2} was actually correct. It just needs a re.DOTALL flag so that the dot matches newlines as well as other characters.

Here is a working test:

>>> import re

>>> s = '''First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate and other
stuff you're
not interested in
like another matchdate
or a matchdate redux.
'''

>>> print(re.search('(.*?matchdate){2}', s, re.DOTALL).group())
First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate

Regex, how to match everything up to nth occurrence

Question

2 answers

solution1
2 ACCPTED 2017-03-17 21:50:12

If you can, remove the `g` flag

Add a `^` to the front of the regex

If Python is available, use `.split()` and `.join()`

solution2
1 2017-03-19 05:57:33

Regex, how to match everything up to nth occurrence

Question

2 answers

solution1 2 ACCPTED 2017-03-17 21:50:12

If you can, remove the g flag

Add a ^ to the front of the regex

If Python is available, use .split() and .join()

solution2 1 2017-03-19 05:57:33

solution1
2 ACCPTED 2017-03-17 21:50:12

If you can, remove the `g` flag

Add a `^` to the front of the regex

If Python is available, use `.split()` and `.join()`

solution2
1 2017-03-19 05:57:33