PHP regex to clean a specific string from URLs only

Question

Any regex ninjas out there to come up with a PHP solution to cleaning the tag from any http/url , but leaving the tag in the rest of the text?

eg:

the word <cite>printing</cite> is in http://www.thisis<cite>printing</cite>.com

should become:

the word <cite>printing</cite> is in http://www.thisisprinting.com

Answer 1

This is what I would do:

<?php
//a callback function wrapper for strip_tags
function strip($matches){
    return strip_tags($matches[0]);
}

//the string
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
//match a url and call the strip callback on it
$str = preg_replace_callback("/:\/\/[^\s]*/", 'strip', $str);

//prove that it works
var_dump(htmlentities($str));

http://codepad.viper-7.com/XiPcs9

Answer 2

Your appropriate regex for this substitution could be:

#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s

s flag to match in all newlines.
Using lazy selection between tags for being accurate not to escape more similar tags

Snippet:

<?php
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
$replaced = preg_replace('#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s', "$1$2$3$4", $str);
echo $replaced;

// Output: the word <cite>printing<cite> is in http://www.thisisprinting.com

Live demo

Answer 3

Assuming you can identify URLs from your text you can:

$str = 'http://www.thisis<cite>printing</cite>.com';
$str = preg_replace('~</?cite>~i', "", $str);
echo $str;

OUTPUT:

http://www.thisisprinting.com

PHP regex to clean a specific string from URLs only

Question

3 answers

solution1
1 2013-10-24 21:56:18

solution2
1 2013-10-24 22:02:35

solution3
0 2013-10-24 21:48:46

PHP regex to clean a specific string from URLs only

Question

3 answers

solution1 1 2013-10-24 21:56:18

solution2 1 2013-10-24 22:02:35

solution3 0 2013-10-24 21:48:46

solution1
1 2013-10-24 21:56:18

solution2
1 2013-10-24 22:02:35

solution3
0 2013-10-24 21:48:46