简体   繁体   中英

How to use php to get each matched regex pattern

I am attempting to use preg_match_all to extract a repeated pattern out of an html string.

The problem seems to be that my pattern has a defined beginning and end, but a wildcard portion in between. So the preg_match_all ends up only getting the biggest match, but not the individual matches.

My ultimate goal is to isolate each <a ...>some text</a> out of an html string, and to wrap them as so: <font ...><a ...>some text</a></font> .

But first off I want to simply successfully isolate them each:

$lvs_regex = "/<a.+<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
for($i = 0 ; $i < count( $matches ) ; $i++ )
  { print $matches[ $i ][0] . "<br/>" ;
  } 

The return that I want:

[0] => <a href='...'>AAA</a>

[1] => <a href='...'>BBB</a>

[2] => <a href='...'>CCC</a>

But I only get one match:

[0] => <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a>

Maybe something like this:

$lvs_regex = "/<a.*?<\/a>/" ;
$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches);

Basically the pattern needed is /<a.*?<\\/a>/ . This match every occurrence in your string.

Now, var_dump($matches[0]) gives

array (size=3)
    0 => string '<a href='...'>AAA</a>' (length=21)
    1 => string '<a href='...'>BBB</a>' (length=21)
    2 => string '<a href='...'>CCC</a>' (length=21)

that is the return that you want.

So by following with

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    var_dump($matches[0][ $i ] . "<br/>");
} 

you see now it's matching every occurrence:

string '<a href='...'>AAA</a><br/>' (length=26)
string '<a href='...'>BBB</a><br/>' (length=26)
string '<a href='...'>CCC</a><br/>' (length=26)

-------- NEW EDIT ---------

So now you can modifiy your loop in order to wrap every a tag matched.

$result='';

for($i = 0 ; $i < count( $matches[0] ) ; $i++ )
{ 
    $result .= "<font ...>".$matches[0][ $i ] . "</font><br/>";
} 

var_dump($result);

And you get

<font ...><a href='...'>AAA</a></font><br/><font ...><a href='...'>BBB</a></font><br/><font ...><a href='...'>CCC</a></font><br/>

---------- NEW EDIT ----------

As suggested @Casimir et Hippolyte by you can avoid the matching of "wrong or unwanted" tag as abbr by adding a word boudary in the pattern:

$lvs_regex = "/<a\b.*?<\/a>/" ; 

and optionally obtain the same result by using a foreach instead of a for loop. Ex:

foreach($matches[0] as $matches)
{ 
    $result .= "<font ...>".$matches . "</font><br/>";
} 

And a link about foreach internal behaviour, in case you would get a deep look at the construct.

$lvs_regex = "/<a.+<\/a>/U" ;

$lvs_test  = "click <a href='...'>AAA</a> now, <a href='...'>BBB</a> later, <a href='...'>CCC</a> tomorrow" ;

preg_match_all( $lvs_regex , $lvs_test , $matches ) ;
if ($matches) {
    foreach ($matches[0] as $match) {
        print $match."\n";
    }
}

Result is:

<a href='...'>AAA</a>
<a href='...'>BBB</a>
<a href='...'>CCC</a>

Use 'ungreedy' specificator /U

http://www.php.net/manual/fa/reference.pcre.pattern.modifiers.php

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM