In php, how can I use a regular expression to capture everything between two patterns (and the shortest instance of each pattern)?

Question

I must be overcomplicating this, but I can't figure it out for the life of me.

I have a standard html document stored as a string, and I need to get the contents of the paragraph. I'll make an example case.

$stringHTML=
"<html>

<head>
<title>Title</title>
</head>

<body>

<p>This is the first paragraph</p>
<p>This is the second</p>
<p>This is the third</p>
<p>And fourth</p>

</body>
</html>";

If I use

$regex='~(<p>)(.*)(</p>)~i';
preg_match_all($regex, $stringHTML, $newVariable);

I won't get 4 results. Rather, I'll get 10. I get 10 because the regex matches the first  and first  as well as the first  and fourth 

How can I search between two words, and return only the results of whats between each paragraph?

Answer 1

Use HTML parser like DOM or XPATH to parse HTML. Dont use Regex to parse HTML . Here is how it can be easily parsed by DOMDocument.

$doc = new \DOMDocument;
$doc->loadHTML($stringHTML);
$ps = $doc->getElementsByTagName("p");
for($i=0;$i<$ps->length; $i++){
    echo $ps->item($i)->textContent. "\n";
}

Code in action

Using this RegEx (as you said its a regex practice ) you'll get 4 results.

preg_match_all("#<p>(.*)</p>#", $stringHTML, $matches);
print_r($matches[1]);

Here look around syntaxes are used. See the code in action .

Answer 2

Use .*? to get the shortest match instead of the longest match.

Answer 3

Your regex should be /(.*?)<\\/p>/i . It will only matches the strings between  and put it in an array.

you shouldn't do a group : ()

In php, how can I use a regular expression to capture everything between two patterns (and the shortest instance of each pattern)?

Question

3 answers

solution1
1 2013-01-01 06:13:34

solution2
0 ACCPTED 2013-01-01 04:51:09

solution3
0 2013-01-01 05:59:57

In php, how can I use a regular expression to capture everything between two patterns (and the shortest instance of each pattern)?

Question

3 answers

solution1 1 2013-01-01 06:13:34

solution2 0 ACCPTED 2013-01-01 04:51:09

solution3 0 2013-01-01 05:59:57

solution1
1 2013-01-01 06:13:34

solution2
0 ACCPTED 2013-01-01 04:51:09

solution3
0 2013-01-01 05:59:57