简体   繁体   English

PHP正则表达式仅从URL清除特定字符串

[英]PHP regex to clean a specific string from URLs only

Any regex ninjas out there to come up with a PHP solution to cleaning the tag from any http/url , but leaving the tag in the rest of the text? 任何正则表达的忍者在那里想出一个PHP解决方案来清除任何http / url中的标签,但是将标签留在文本的其余部分?

eg: 例如:

the word <cite>printing</cite> is in http://www.thisis<cite>printing</cite>.com

should become: 应成为:

the word <cite>printing</cite> is in http://www.thisisprinting.com

This is what I would do: 这就是我要做的:

<?php
//a callback function wrapper for strip_tags
function strip($matches){
    return strip_tags($matches[0]);
}

//the string
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
//match a url and call the strip callback on it
$str = preg_replace_callback("/:\/\/[^\s]*/", 'strip', $str);

//prove that it works
var_dump(htmlentities($str));

http://codepad.viper-7.com/XiPcs9 http://codepad.viper-7.com/XiPcs9

Your appropriate regex for this substitution could be: 适合此替换的正则表达式可能是:

#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s
  1. s flag to match in all newlines. s标志在所有换行符中匹配。

  2. Using lazy selection between tags for being accurate not to escape more similar tags 在标签之间使用lazy选择,以准确无法逃避更多类似的标签

Snippet: 片段:

<?php
$str = "the word <cite>printing<cite> is in http://www.thisis<cite>printing</cite>.com";
$replaced = preg_replace('#(https?://)(.*?)<cite>(.*?)</cite>([^\s]*)#s', "$1$2$3$4", $str);
echo $replaced;

// Output: the word <cite>printing<cite> is in http://www.thisisprinting.com

Live demo 现场演示

Assuming you can identify URLs from your text you can: 假设您可以从文本中识别URL,您可以:

$str = 'http://www.thisis<cite>printing</cite>.com';
$str = preg_replace('~</?cite>~i', "", $str);
echo $str;

OUTPUT: OUTPUT:

http://www.thisisprinting.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM