简体   繁体   中英

Regular expression

We want to do find and replace using preg_replace . But unable to get the desired result

here is my string

    $x = '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/10/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/20>';05/1/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/9/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2006/11/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/i_leave_shreds_.html#comment-11657412">FALLACI</a

    echo preg_replace('/<a(.*?)href="http:\/\/atlasshrugs2000.typepad.com\/atlas_shrugs\/([0-9\/]{0,7}?)(.*?)_.html#(.*?)"(.*?)>/','<a$1href="http://localhost/test/$3#$4"$5>',$x);

Its gives the following result

<a href="http://localhost/test/2005/11/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/10/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/1/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/9/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2006/11/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>

But we want result like

<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>

Please help me. Thanks in advance :)

Solution

If we start by side lining your current regex pattern...

This:

$x = '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/10/i_leave_shreds_.html#comment-11657411">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/1/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/9/i_leave_shreds_.html#comment-11657413">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2006/11/i_leave_shreds_.html#comment-11657414">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/i_leave_shreds_.html#comment-11657415">FALLACI</a>';

echo preg_replace('~<a.*?href=["\'].*?/([^/]*?)_\.html#(.*?)["\'].*?>(.*?)</a>~', "<a href='http://localhost/test/$1#$2'>$3</a><br>\n", $x);

Outputs:

<a href='http://localhost/test/i_leave_shreds#comment-11657410'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657411'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657412'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657413'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657414'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657415'>FALLACI</a><br>

Regex Explanation

~<a.*?href=["'].*?/([^/]*?)_\.html#(.*?)["'].*?>(.*?)</a>~
  • ~ = Starting delimeter
  • <a.*? = matches the opening a tag followed by any character 0 or more times until it reaches...
  • href=["'] = matches href= followed by either " or '
  • .*?/ = matches all characters until the final slash before...
  • ([^/]*?) = capture group and catches everything between the final slash and...
  • _\\.html# = matches the underscore and html file extension of the url followed by a #
  • (.*?) = capture group matches all characters (the comment/number) before...
  • ["'].*?> = matches either " or ' followed by any charachter 0 or more times until it reaches the end of the opening a tag: >
  • (.*?) = matches the text between the opening and closing a tags: FALLACI
  • </a> = matches the closing a tag

Update

To limit the replacements to only those containing: atlasshrugs2000.typepad.com you can update the regex to:

~<a.*?href=["\'].*?atlasshrugs2000.typepad.com.*?/([^/]*?)_\.html#(.*?)["\'].*?>(.*?)</a>~

The difference between this regex and the original is (line 4 of the bullet-point list above):

.*?/                                <-- Original
.*?atlasshrugs2000.typepad.com.*?/  <-- Updated

Simply the updated version checks for any characters (eg http:// ) before the specific URL atlasshrugs2000.typepad.com followed by any characters after it.

Examples of matches (http/https/BLANK):

<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>
<a href="atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>
<a href="https://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>

the problem is here: ([0-9\\/]{0,7}?) ... you've got 0-7 instances, and then you want to get as few of those as possible. you don't need to specify both... remove the ? at the end (so it looks like ([0-9\\/]{0,7}) ) and then it will work.

Try:
/<a(.*?)href="http:\\/\\/atlasshrugs2000.typepad.com\\/atlas_shrugs\\/([0-9\\/]{0,7})\\/(.*?)_.html#(.*?)"(.*?)>/

change {0,7}?)( to {0,7})\\/(

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM