简体   繁体   English

将 HTML 代码中的链接模式与 RegEx 匹配

[英]Match link patterns in HTML code with a RegEx

I'm using a linkify function, which detects link-like patterns by using regex and replaces those with a-tags to reveal a clickable link.我正在使用 linkify 函数,它通过使用正则表达式检测类似链接的模式,并用 a-tags 替换这些模式以显示可点击的链接。

The regex looks like that:正则表达式如下所示:

    // http://, https://, ftp:// 
    var urlPattern = /\b(?![^<]*>|[^<>]*<\/)(?:https?|ftp):\/\/[a-z0-9-+&@#\/%?=~_|!:,.;]*[a-z0-9-+&@#\/%=~_|]/gim;
    /* Some explanations:
    (?!     # Negative lookahead start (will cause match to fail if contents match)
    [^<]*   # Any number of non-'<' characters
    >       # A > character
    |       # Or
    [^<>]*  # Any number of non-'<' and non-'>' characters
    </      # The characters < and /
     )      # End negative lookahead.
    */
    

and replaces the link like this:并像这样替换链接:

 return textInput.replace(urlPattern, '<a target="_blank" rel="noopener" href="$&">$&</a>')

The regex works perfectly for in-text links.正则表达式非常适用于文本链接。 However, I am using it in HTML-Code also, such as但是,我也在 HTML 代码中使用它,例如

<ul><li>Link: https://www.link.com</li></ul> //linkify not working
<ul><li>Link: https://www.link.com <br/></li></ul> //linkify working

where just the secont example is working.只有第二个例子在起作用。 I dont't know why the behavior is different and would be very glad to get some help from you.我不知道为什么行为会有所不同,很高兴从您那里得到一些帮助。 What should my regex look like, to linkify without the break in list elements?我的正则表达式应该是什么样的,在不中断列表元素的情况下进行链接?

Ciao,再见,

if I understood correctly your issue I think that this regex should be ok to detect the links in both the scenarios:如果我正确理解您的问题,我认为这个正则表达式应该可以检测两种情况下的链接:

\b(?![^<]*>)(?:https?|ftp):\/\/([a-z0-9-+&@#\/%?=~_|!:,.;]*)

Essentially with the first part we are segmenting in this way:基本上,我们以这种方式分割第一部分:

regex_segmentation

Then we go and grab the different parts of interest: the first part is a non-capturing group as in your original expression to strip the protocol later, if really unneeded.然后我们去获取感兴趣的不同部分:第一部分是一个非捕获组,如您原始表达式中的那样,如果真的不需要,稍后将剥离协议。 The last part takes the remaining part of the URL最后一部分取URL的剩余部分

For the way we created the regex we can now decide if taking the entire URL or just the second part.对于我们创建正则表达式的方式,我们现在可以决定是获取整个 URL 还是只获取第二部分。 This is evident looking to the bottom-right of this screenshot:从屏幕截图的右下角可以明显看出这一点:

正则表达式处理

Now in order to log the two parts we can take this nice snippet :现在为了记录这两部分,我们可以使用这个不错的片段

const str = '<ul><li>Link: https://www.link.com</li></ul>';
var myRegexp = /\b(?![^<]*>)(?:https?|ftp):\/\/([a-z0-9-+&@#\/%?=~_|!:,.;]*)/gim;
var match = myRegexp.exec(str);
console.log(match[0]);
console.log(match[1]); 

Possible variations:可能的变化:

  • in a situation like the one presented above you can simplify further your regex to:在上述情况下,您可以将正则表达式进一步简化为:

    (?:https?|ftp):\\/\\/([a-z0-9-+&@#\\/%?=~_|!:,.;]*)

getting the same output得到相同的输出

  • if the full URL is enough you can remove the round parentheses of the second group如果完整的 URL 足够你可以删除第二组的圆括号

    (?:https?|ftp):\\/\\/[a-z0-9-+&@#\\/%?=~_|!:,.;]*

Have a good day,祝你有美好的一天,
Antonino安东尼诺

PS - I'm assuming that your examples were meant to be: PS - 我假设你的例子是:

<ul><li>Link: https://www.link.com</li></ul>
<ul><li>Link: https://www.link.com <br/></li></ul>

ie with https , http or ftp which makes the second case work with your original regex即使用httpshttpftp使第二种情况与您的原始正则表达式一起使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM