从字符串中提取URL

Question

I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs. 我发现了一个正则表达式，假定它可以捕获URL，但不能捕获某些URL。

$("#links").change(function() {

    //var matches = new array();
    var linksStr = $("#links").val();
    var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
    var matches = linksStr.match(pattern);

    for(var i = 0; i < matches.length; i++) {
      alert(matches[i]);
    }

})

It doesn't capture this url (I need it to): 它不会捕获此URL（我需要它）：

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar

But it captures this 但是它抓住了这个

http://www.wupload.com http://www.wupload.com

Answer 1

Several things: 几件事：

The main reason it didn't work, is when passing strings to RegExp() , you need to slashify the slashes. 它不起作用的主要原因是，当将字符串传递给RegExp() ，您需要将斜杠斜线化。 So this: 所以这：
```
 "^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([az\\.]{2,6})([\\/\\w \\.-]*)*\\/?$" 
```
Should be: 应该：
```
 "^(https?:\\/\\/)?([\\\\da-z\\\\.-]+)\\\\.([az\\\\.]{2,6})([\\/\\\\w \\\\.-]*)*\\/?$" 
```
Next, you said that FF reported, "Regular expression too complex". 接下来，您说FF报告“正则表达式太复杂”。 This suggests that linksStr is several lines of URL candidates. 这表明linksStr是几行URL候选。
Therefore, you also need to pass the m flag to RegExp() . 因此，您还需要将m标志传递给RegExp() 。
The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". 现有的正则表达式正在阻止合法值，例如：“ HTTP://STACKOVERFLOW.COM”。 So, also use the i flag with RegExp() . 因此，还要将i标志与RegExp() 。
Whitespace always creeps in, especially in multiline values. 空格总是会蔓延，尤其是在多行值中。 Use a leading \\s* and $.trim() to deal with it. 使用前导\\s*和$.trim()进行处理。
Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar are not allowed? 不允许使用相对链接，例如/file/63075291/LlMlTL355-EN6-SU8S.rar ？

Putting it all together (except for item 5), it becomes: 将所有内容放在一起（第5项除外），它将变为：

var linksStr    = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar  \n"
                + "  http://XXXupload.co.uk/fun.exe \n "
                + " WWW.Yupload.mil ";
var pattern     = new RegExp (
                    "^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
                    , "img"
                );

var matches     = linksStr.match(pattern);
for (var J = 0, L = matches.length;  J < L;  J++) {
    console.log ( $.trim (matches[J]) );
}

Which yields: 产生：

http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
http://XXXupload.co.uk/fun.exe
WWW.Yupload.mil

Answer 2

为什么不做：URLS = str.match（/ https？：[^ \\ s] + / ig）;

Answer 3

(https?\:\/\/)([a-z\/\.0-9A-Z_-\%\&\=]*)

这将在文本中找到任何网址

从字符串中提取URL

问题描述

3 个解决方案

解决方案1
1 已采纳 2011-08-09 08:28:16

解决方案2
0 2011-08-08 16:47:59

解决方案3
0 2011-08-08 16:48:31

从字符串中提取URL

问题描述

3 个解决方案

解决方案1 1 已采纳 2011-08-09 08:28:16

解决方案2 0 2011-08-08 16:47:59

解决方案3 0 2011-08-08 16:48:31

解决方案1
1 已采纳 2011-08-09 08:28:16

解决方案2
0 2011-08-08 16:47:59

解决方案3
0 2011-08-08 16:48:31