简体   繁体   English

替换字符串中与php中的网址格式不匹配的所有网址

[英]Replace all urls in string not matching url pattern in php

I'm using the following code to filter out urls from a block of HTML text in PHP. 我正在使用以下代码从PHP中的HTML文本块中过滤出URL。

preg_replace('#<a(?![^>]+?href="?http://keepthisdomain.com/foo/bar"?).*?>(.*?)</a>#i', '\1', $text);

It's intended to replace all url's that do not match the specified url pattern. 它旨在替换所有与指定的网址格式不匹配的网址。 However I do want to include all tags that have the attribute rel="shadowbox[a]" set. 但是,我确实要包括所有设置了rel =“ shadowbox [a]”属性的标签。

How can I modify this preg_replace to do that? 如何修改此preg_replace来做到这一点?

You are better off not using regex at all and using a parser instead, for the reasons set forth in this answer . 出于此答案中所述的原因 ,最好不要使用正则表达式,而应使用解析器。

That said, you can do it with regex, but it's tricky: 也就是说,您可以使用正则表达式来做到这一点,但这很棘手:

preg_replace('#<a(?![^>]+?\bhref="?http://keepthisdomain\.com/foo/bar"?|[^>]+\brel="shadowbox\[a\]").*?>(.*?)</a>#i', '\1', $text);

Details on the regex: 正则表达式的详细信息:

<a(?![^>]+?\bhref="?http://keepthisdomain\.com/foo/bar"?|[^>]+\brel="shadowbox\[a\]").*?>(.*?)</a>

正则表达式可视化

Out of the following four tags, only the third would be replaced: 在以下四个标签中,只有第三个将被替换:

<a href="http://keepthisdomain.com/foo/bar">foo</a> // left alone
<a href="http://keepthisdomain.com/foo/bar" rel="shadowbox[a]">foo</a> // left alone
<a href="http://rejectthis.com/foo/bar">foo</a> // REPLACED
<a href="http://rejectthis.com/foo/bar" rel="shadowbox[a]">foo</a> // left alone

Edited with a minor tweak to make it match a literal . 进行了细微的编辑,使之与文字匹配. in .com , using \\. .com ,使用\\.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM