简体   繁体   中英

RegEx match string between known strings and after a known text with line breaks

So, I have this text:

<a href="/find/1">testing</a>
<strong>a known text</strong>
<p>testing2</p>
<p>this paragraphs are dynamically</p>
...
<a href="/find/2/">testing again</a>
<a href="/find/3/">testing and again</a>

I want to get all the hrefs that are under the a known text

I use this regex to get all the matches: (?<=<a\\ href=")/find/.*?(?=") But I also get the result: /find/1 which is a result that I don't want.

I've tried this: a known tex[\\w\\W](?<=<a\\ href=")/find/*?(?=") but it's not working. I have no idea how to get this done correctly. Basically I want to get only /find/2/ and /find/3

PS: I am not really using C# but a software that is made in C# and uses the C# regex.

I have this regex, which is a bit different from Marcin's but I'm not used to have variable length regex in lookbehinds:

var regex = new Regex(@"(?:a known text|(?<!^)\G)[\w\W]+?((?<=<a\ href="")/find/.*?(?=""))");

ideone demo

Which is believe should make the regex a little bit more efficient.

\\G is a special character which matches where the previous match ended, so that after finding the first /find/ , it tries matching again. I had to put a negative lookbehind to prevent it from matching newline as well.

a known tex[\w\W](?<=<a\ href=")/find/*?(?=")

Concerning your regex, some little mistakes you made was to forget the quantifier for [\\w\\W] and the dot for *? after /find/ . Using a known tex[\\w\\W]+(?<=<a\\ href=")(/find/.*?)(?=") would have got you only /find/2/ , which is already better than nothing!

EDIT: As AlanMoore rightly pointed out, you can simplify the regex:

var regex = new Regex(@"(?:a known text|(?<!^)\G)[\w\W]+?<a href=""(/find/.*?)""");

And to make the . match newlines, we can use (?s) and remove the [\\w\\W] part:

var regex = new Regex(@"(?s)(?:a known text|(?<!^)\G).*?<a href=""(/find/.*?)""");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM