[英]PHP regex to clean a specific string from URLs only
任何正則表達的忍者在那里想出一個PHP解決方案來清除任何http / url中的標簽,但是將標簽留在文本的其余部分?
例如:
the word <cite>printing</cite> is in http://www.thisis<cite>printing</cite>.com
應成為:
the word <cite>printing</cite> is in http://www.thisisprinting.com
這就是我要做的:
<?php
//a callback function wrapper for strip_tags
function strip($matches){
return strip_tags($matches[0]);
}
//the string
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
//match a url and call the strip callback on it
$str = preg_replace_callback("/:\/\/[^\s]*/", 'strip', $str);
//prove that it works
var_dump(htmlentities($str));
適合此替換的正則表達式可能是:
#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s
s
標志在所有換行符中匹配。
在標簽之間使用lazy
選擇,以准確無法逃避更多類似的標簽
片段:
<?php
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
$replaced = preg_replace('#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s', "$1$2$3$4", $str);
echo $replaced;
// Output: the word <cite>printing<cite> is in http://www.thisisprinting.com
假設您可以從文本中識別URL,您可以:
$str = 'http://www.thisis<cite>printing</cite>.com';
$str = preg_replace('~</?cite>~i', "", $str);
echo $str;
OUTPUT:
http://www.thisisprinting.com
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.