简体繁体 English

用于在URL之后查找URL但不包含标点符号的正则表达式

[英]Regular expression to find URLs but not include punctuation AFTER the URL

原文 2011-06-27 14:26:10 0 2 javascript/ regex/ url

Example: "My site is http://www.abcd.com , and yours is http://www.def.ghi/jkl . Is Fred's https://www.xyz.com ? Or is it http://www.xxx.com?abc=def ? (I thought his site was http://www.mmm.com ), but clearly it's not." 示例：“我的网站是http://www.abcd.com ，而您的网站是http：//www.def.ghi/jkl 。是Fred的https://www.xyz.com吗？还是http：// www.xxx.com?abc=def ？（我以为他的网站是http://www.mmm.com ），但显然不是。”

This should extract 这应该提取

http://www.abcd.com http://www.def.ghi/jkl https://www.xyz.com http://www.xxx.com?abc=def http://www.mmm.com http://www.abcd.com http：//www.def.ghi/jkl https://www.xyz.com http://www.xxx.com?abc=def http：//www.mmm。 com

Notes: it should assume that any punctuation following the url is NOT part of the url, eg the comma after http://www.abcd.com , is not part of the url. 注意：应假定该网址后面的任何标点符号都不是该网址的一部分，例如， http ://www.abcd.com之后的逗号也不是该网址的一部分。 This includes trailing question marks, which I realize in actuality COULD be part of the url. 这包括结尾的问号，实际上我意识到可以将其作为URL的一部分。 Of course, if a question mark is followed by querystring data, it SHOULD be considered part of the url. 当然，如果问号后面跟有查询字符串数据，则应将其视为url的一部分。 Note that urls might be followed by multiple punctuation marks, as in the the case of (Is your url http://abcd.com )? 请注意，URL后面可能带有多个标点符号，例如（URL是http://abcd.com吗？

Urls (and their trailing punctuation, if any) will always be followed by a space, a newline/return character -- or they'll be the end of the string being tested. Urls（及其结尾的标点符号，如果有的话）后面总是带有空格，换行符/返回符-否则它们将成为要测试的字符串的结尾。

The will be preceded by a whitespace character or, possibly, an open bracket or parenthesis, as in "Please visit my site ( http://www.abcd.com )." 的前面将带有空格字符，或者可能带有空心括号或括号，如“请访问我的网站（ http://www.abcd.com ）”中所述。 Or they'll come at the beginning of the string. 否则它们将出现在字符串的开头。

This regexp should work for http, https and ftp. 此regexp应该适用于http，https和ftp。

This is for an Actionscript project. 这是用于Actionscript项目的。 I believe that Actionscript uses the same regular-expression engine as Javascript. 我相信Actionscript使用与Java相同的正则表达式引擎。

Thanks! 谢谢！

2 个解决方案

Have a look here: http://www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/ 在这里看看： http : //www.regexguru.com/2008/11/detecting-urls-in-a-block-of-text/

EDIT: shanethehat and divillysausages also mentioned this link: http://gskinner.com/RegExr/ which I hadn't seen before and which features online evaluation (in other words, you can tune your regex without firing up your coding IDE, which is awesome). 编辑：shanethehat和divillysausages也提到了此链接： http ://gskinner.com/RegExr/，我以前从未见过，并且具有在线评估功能（换句话说，您可以在不启动编码IDE的情况下调整regex，太棒了）。 Thanks! 谢谢！

First off, rolling your own regexp to parse URLs is a terrible idea . 首先，滚动自己的regexp来解析URL是一个糟糕的主意 。 You must imagine this is a common enough problem that someone has written, debugged and tested a library for it, according to the RFCs . 您必须想象这是一个非常普遍的问题，根据RFC ，有人为此编写，调试和测试了一个库。 There are a ton of edge cases when it comes to parsing URLs: international domain names , actual (.museum) vs. nonexistent (.jpg) URLs, weird punctuation including parentheses , punctuation at the end of the URL etc. 解析URL时有很多边缘情况：国际域名，实际（.museum）与不存在（.jpg）URL，包含括号的奇怪标点符号，URL末尾的标点符号等。

I've looked at a ton of libraries, and they all have their downsides. 我看了很多图书馆，它们都有缺点。 See a comparison of JavaScript URL parsing libraries here . 在此处查看JavaScript URL解析库的比较。

If you want a regular expression, the one in Component is quite comprehensive. 如果要使用正则表达式，则Component中的表达式非常全面。