Regular expression

Question

We want to do find and replace using preg_replace . But unable to get the desired result

here is my string

    $x = '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/10/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/20>';05/1/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/9/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2006/11/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
    $x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/i_leave_shreds_.html#comment-11657412">FALLACI</a

    echo preg_replace('/<a(.*?)href="http:\/\/atlasshrugs2000.typepad.com\/atlas_shrugs\/([0-9\/]{0,7}?)(.*?)_.html#(.*?)"(.*?)>/','<a$1href="http://localhost/test/$3#$4"$5>',$x);

Its gives the following result

<a href="http://localhost/test/2005/11/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/10/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/1/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2005/9/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/2006/11/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>

But we want result like

<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>
<a href="http://localhost/test/i_leave_shreds#comment-11657412">FALLACI</a>

Please help me. Thanks in advance :)

Answer 1

Solution

If we start by side lining your current regex pattern...

This:

$x = '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/10/i_leave_shreds_.html#comment-11657411">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/1/i_leave_shreds_.html#comment-11657412">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/9/i_leave_shreds_.html#comment-11657413">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2006/11/i_leave_shreds_.html#comment-11657414">FALLACI</a>';
$x .= '<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/i_leave_shreds_.html#comment-11657415">FALLACI</a>';

echo preg_replace('~<a.*?href=["\'].*?/([^/]*?)_\.html#(.*?)["\'].*?>(.*?)</a>~', "<a href='http://localhost/test/$1#$2'>$3</a><br>\n", $x);

Outputs:

<a href='http://localhost/test/i_leave_shreds#comment-11657410'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657411'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657412'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657413'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657414'>FALLACI</a><br>
<a href='http://localhost/test/i_leave_shreds#comment-11657415'>FALLACI</a><br>

Regex Explanation

~<a.*?href=["'].*?/([^/]*?)_\.html#(.*?)["'].*?>(.*?)</a>~

~ = Starting delimeter
<a.*? = matches the opening a tag followed by any character 0 or more times until it reaches...
href=["'] = matches href= followed by either " or '
.*?/ = matches all characters until the final slash before...
([^/]*?) = capture group and catches everything between the final slash and...
_\\.html# = matches the underscore and html file extension of the url followed by a #
(.*?) = capture group matches all characters (the comment/number) before...
["'].*?> = matches either " or ' followed by any charachter 0 or more times until it reaches the end of the opening a tag: >
(.*?) = matches the text between the opening and closing a tags: FALLACI
</a> = matches the closing a tag

Update

To limit the replacements to only those containing: atlasshrugs2000.typepad.com you can update the regex to:

~<a.*?href=["\'].*?atlasshrugs2000.typepad.com.*?/([^/]*?)_\.html#(.*?)["\'].*?>(.*?)</a>~

The difference between this regex and the original is (line 4 of the bullet-point list above):

.*?/                                <-- Original
.*?atlasshrugs2000.typepad.com.*?/  <-- Updated

Simply the updated version checks for any characters (eg http:// ) before the specific URL atlasshrugs2000.typepad.com followed by any characters after it.

Examples of matches (http/https/BLANK):

<a href="http://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>
<a href="atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>
<a href="https://atlasshrugs2000.typepad.com/atlas_shrugs/2005/11/i_leave_shreds_.html#comment-11657410">FALLACI</a>

Answer 2

the problem is here: ([0-9\\/]{0,7}?) ... you've got 0-7 instances, and then you want to get as few of those as possible. you don't need to specify both... remove the ? at the end (so it looks like ([0-9\\/]{0,7}) ) and then it will work.

Answer 3

Try:
/<a(.*?)href="http:\\/\\/atlasshrugs2000.typepad.com\\/atlas_shrugs\\/([0-9\\/]{0,7})\\/(.*?)_.html#(.*?)"(.*?)>/

change {0,7}?)( to {0,7})\\/(

Regular expression

Question

3 answers

solution1
2 2013-10-11 13:10:41

Solution

Regex Explanation

Update

solution2
0 2013-10-11 13:00:03

solution3
0 2013-10-11 13:00:14

Regular expression

Question

3 answers

solution1 2 2013-10-11 13:10:41

Solution

Regex Explanation

Update

solution2 0 2013-10-11 13:00:03

solution3 0 2013-10-11 13:00:14

solution1
2 2013-10-11 13:10:41

solution2
0 2013-10-11 13:00:03

solution3
0 2013-10-11 13:00:14