正则表达式查找不在标记中的URL

Question

I am breaking my head on this: 我为此感到震惊：

I am trying to find URLs in Javascript with regex. 我正在尝试使用正则表达式在Javascript中找到URL。

Update: I use Javascript on serverside, so I can not walk through the DOM 更新：我在服务器端使用Javascript，因此无法浏览DOM

(http:\/\/|https:\/\/|)(www\.)([a-zA-Z0-9]+\.[a-zA-Z0-9\-]+|[a-zA-Z0-9\-]+)\.[a-zA-Z\.]{2,6}(\/[a-zA-Z0-9\.\?=\/#%&\+-]+|\/|)/gi

The above sample is working great. 上面的示例效果很好。 But I need to change the regex that Urls won't be found in: href="url" and not in <a ....>url</a> but in anything else like <p ...>url</p> or <div ....>text text text url, url, url text text</div> 但是我需要更改在以下网址中不会找到<a ....>url</a>的正则表达式： href="url"而不是<a ....>url</a>而不是其他类似<p ...>url</p>或<div ....>text text text url, url, url text text</div>

Can anybody help? 有人可以帮忙吗？

Thank you and cheers Michael 谢谢迈克尔的欢呼

Answer 1

It would be simpler if you allowed a non-href url to be the text of an a element. 如果您允许将非href网址作为元素的文本，则会更简单。 As you require, you need to avoid any child nodes of the a elements, in case you have an url like text in a span or strong or whatever child of an a. 根据需要，需要避免a元素的任何子节点，以防您有跨度或强文本之类的URL或a的任何子元素。

   function someurls(node){
        var A= [], tem, rx=/^https?\:\/\/[^\s]+/g;
        if(node){
            node= node.firstChild;
            while(node && node.tagName== 'A') node= node.nextSibling;
            while(node!= null){
                if(node.nodeType== 3){
                    if((tem= node.data.match(rx))!= null) A[A.length]= tem;
                }
                else A= A.concat(someurls(node));
                node= node.nextSibling;
                while(node && node.tagName== 'A') node= node.nextSibling;
            }
        }
        return A;
    }

// alert(someurls(document.body).join('\\n') // alert（someurls（document.body）.join（'\\ n'）

正则表达式查找不在标记中的URL

问题描述

1 个解决方案

解决方案1
0 2011-11-07 17:24:12

正则表达式查找不在标记中的URL

问题描述

1 个解决方案

解决方案1 0 2011-11-07 17:24:12

解决方案1
0 2011-11-07 17:24:12