Any regex ninjas out there to come up with a PHP solution to cleaning the tag from any http/url , but leaving the tag in the rest of the text?
eg:
the word <cite>printing</cite> is in http://www.thisis<cite>printing</cite>.com
should become:
the word <cite>printing</cite> is in http://www.thisisprinting.com
This is what I would do:
<?php
//a callback function wrapper for strip_tags
function strip($matches){
return strip_tags($matches[0]);
}
//the string
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
//match a url and call the strip callback on it
$str = preg_replace_callback("/:\/\/[^\s]*/", 'strip', $str);
//prove that it works
var_dump(htmlentities($str));
Your appropriate regex for this substitution could be:
#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s
s
flag to match in all newlines.
Using lazy
selection between tags for being accurate not to escape more similar tags
Snippet:
<?php
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
$replaced = preg_replace('#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s', "$1$2$3$4", $str);
echo $replaced;
// Output: the word <cite>printing<cite> is in http://www.thisisprinting.com
Assuming you can identify URLs from your text you can:
$str = 'http://www.thisis<cite>printing</cite>.com';
$str = preg_replace('~</?cite>~i', "", $str);
echo $str;
OUTPUT:
http://www.thisisprinting.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.