[英]URL extraction from string
I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs. 我发现了一个正则表达式,假定它可以捕获URL,但不能捕获某些URL。
$("#links").change(function() {
//var matches = new array();
var linksStr = $("#links").val();
var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
var matches = linksStr.match(pattern);
for(var i = 0; i < matches.length; i++) {
alert(matches[i]);
}
})
It doesn't capture this url (I need it to): 它不会捕获此URL(我需要它):
http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar
But it captures this 但是它抓住了这个
Several things: 几件事:
The main reason it didn't work, is when passing strings to RegExp()
, you need to slashify the slashes. 它不起作用的主要原因是,当将字符串传递给
RegExp()
,您需要将斜杠斜线化。 So this: 所以这:
"^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([az\\.]{2,6})([\\/\\w \\.-]*)*\\/?$"
Should be: 应该:
"^(https?:\\/\\/)?([\\\\da-z\\\\.-]+)\\\\.([az\\\\.]{2,6})([\\/\\\\w \\\\.-]*)*\\/?$"
Next, you said that FF reported, "Regular expression too complex". 接下来,您说FF报告“正则表达式太复杂”。 This suggests that
linksStr
is several lines of URL candidates. 这表明
linksStr
是几行URL候选。
Therefore, you also need to pass the m
flag to RegExp()
. 因此,您还需要将
m
标志传递给RegExp()
。
The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". 现有的正则表达式正在阻止合法值,例如:“ HTTP://STACKOVERFLOW.COM”。 So, also use the
i
flag with RegExp()
. 因此,还要将
i
标志与RegExp()
。
Whitespace always creeps in, especially in multiline values. 空格总是会蔓延,尤其是在多行值中。 Use a leading
\\s*
and $.trim()
to deal with it. 使用前导
\\s*
和$.trim()
进行处理。
Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar
are not allowed? 不允许使用相对链接,例如
/file/63075291/LlMlTL355-EN6-SU8S.rar
?
Putting it all together (except for item 5), it becomes: 将所有内容放在一起(第5项除外),它将变为:
var linksStr = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar \n"
+ " http://XXXupload.co.uk/fun.exe \n "
+ " WWW.Yupload.mil ";
var pattern = new RegExp (
"^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
, "img"
);
var matches = linksStr.match(pattern);
for (var J = 0, L = matches.length; J < L; J++) {
console.log ( $.trim (matches[J]) );
}
Which yields: 产生:
http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar http://XXXupload.co.uk/fun.exe WWW.Yupload.mil
为什么不做:URLS = str.match(/ https?:[^ \\ s] + / ig);
(https?\:\/\/)([a-z\/\.0-9A-Z_-\%\&\=]*)
这将在文本中找到任何网址
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.