Regular Expression Lookbehind doesn't work as expected

Question

I have a string in .net.

<p class='p1'>Para 1</p><p>Para 2</p><p class="p2">Para 3</p><p>Para 4</p>

Now, I want to get only text inside the tag p (Para 1, Para 2, Para 3, Para4).

I used the following regular expression but it doesn't give me expected result.

(?<=<p.*>).*?(?=</p>)

If I use (?<=).*?(?=) it will give Para 2 and Para 4 which both p tags doesn't have class attribute?

I'd like to know what's wrong with (?<=<p.*>).*?(?=) that code.

Answer 1

Let's illustrate this using RegexBuddy :

RegexBuddy屏幕截图

Your regex matches more than you think - the dot matches any character, so it doesn't care about tag boundaries.

What it is actually doing:

(?<=<p.*>) : Assert that there is <p (followed by any number of characters) anywhere in the string before the current position, followed by a > .
.*? : Match any number of characters...
(?=) : ...until the next occurence of  .

Your question is a bit unclear, but if your plan is to find text within  tags regardless of whether they contain any attributes, you shouldn't be using regular expressions anyway but a DOM parser, for example the HTML agility pack .

That said, if you insist on a regex, try

(?<=<p[^<>]*>)(?:(?!</p>).)*

另一个截图

Explanation:

(?<=<p[^<>]*>)  # Assert position right after a p tag
(?:(?!</p>).)*  # Match any number of characters until the next </p>

Answer 2

Have you tried using following expression?

<p[\s\S]*?>(?<text_inside_p>[\s\S]*?)</p>

group named text_inside_p will contain desired text.

Regular Expression Lookbehind doesn't work as expected

Question

2 answers

solution1
5 ACCPTED 2011-11-01 09:56:38

solution2
1 2011-11-01 09:57:59

Regular Expression Lookbehind doesn't work as expected

Question

2 answers

solution1 5 ACCPTED 2011-11-01 09:56:38

solution2 1 2011-11-01 09:57:59

solution1
5 ACCPTED 2011-11-01 09:56:38

solution2
1 2011-11-01 09:57:59