简体   繁体   English

正则表达式匹配没有<a>标签</a>的链接

[英]regex matching links without <a> tag

(http([s]?):\/\/?)(([a-zA-Z0-9]+(\.?))+)([a-zA-Z0-9]+((\.[a-zA-Z]{2,5}){1,2})((\/[a-zA-Z0-9\?&=_\-\~:/?#[\]@!\$&'()\*\+,;]*)*)((\.[a-zA-Z]{2,5}){0,2}))

This is my regex which is working well for matching the links in the string.这是我的正则表达式,它可以很好地匹配字符串中的链接。 But I don't want it to select every link.但我不希望它选择每个链接。 If a link has "> before it, or </a> after it, that link shouldn't be mathced. How can it be done?如果一个链接在它之前有"> ,或者在它之后有</a> ,则该链接不应该被计算。怎么做?

These should be matched:这些应该匹配:

adasdas http://www.stackoverflow.com asdasas
adasdasahttp://www.stackoverflow.com/something asdas

These should NOT be matched:这些不应该匹配:

adasdas<a href="somelink">           http://www.stackoverflow.com     </a>asdasas
adasdasa<a href="somelink">http://www.stackoverflow.com/something</a> asdas

Why do I need this?: I want every link to be clickable even if it isn't between anchor tags.为什么我需要这个?:我希望每个链接都可以点击,即使它不在锚标签之间。

With all the disclaimers about using regex to parse html, if you want to use regex for this task, this will work:有了关于使用正则表达式解析 html 的所有免责声明,如果你想使用正则表达式来完成这个任务,这将起作用:

$regex="~<a.*?</a>(*SKIP)(*F)|http://\S+~";

See the demo .请参阅演示

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."此问题是此问题中解释的“正则表达式匹配模式,不包括...”的技术的经典案例

The left side of the alternation |左侧交替| matches complete <a ...tags </a> then deliberately fails, after which the engine skips to the next position in the string.匹配完整的<a ...tags </a>然后故意失败,之后引擎跳到字符串中的下一个位置。 The right side matches the urls, and we know they are the right ones because they were not matched by the expression on the left.右侧匹配 url,我们知道它们是正确的,因为它们与左侧的表达式不匹配。

The url regex I put on the right and can be refined, just use whatever suits your needs.我放在右边的 url regex 可以改进,只需使用适合您需求的任何东西。

Reference参考

You need to add lookaround s to your regex cf:您需要将lookaround s 添加到您的正则表达式 cf:

Here's some PHP code I combined (from answers on here) for a function to do this for emails and URLs:这是我组合的一些 PHP 代码(来自此处的答案),用于为电子邮件和 URL 执行此操作的函数:

function replace_links( $content ){
    $content = preg_replace('"<a[^>]+>.+?</a>(*SKIP)(*FAIL)|\b(?:https?)://\S+"', '<a href="$0">$0</a>', $content);
    $content = preg_replace('"<a[^>]+>.+?</a>(*SKIP)(*FAIL)|\b(\S+@\S+\.\S+)\S+"', '<a href="mailto:$0">$0</a>', $content);
    return $content;
}

Demo: https://glot.io/snippets/g6nwd6amyo演示: https ://glot.io/snippets/g6nwd6amyo

Most Updated: https://gist.github.com/tripflex/0cc930c2afe5f4c73f2aed61cedf95d0最新更新: https ://gist.github.com/tripflex/0cc930c2afe5f4c73f2aed61cedf95d0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM