用于url验证的PHP正则表达式，filter_var太过分了

Question

First lets define a "URL" according to my requirements. 首先，根据我的要求定义“URL”。

The only protocols optionally allowed are http:// and https:// 唯一允许的协议是http://和https://

then a mandatory domain name like stackoverflow.com 然后是像stackoverflow.com这样的强制域名

then optionally the rest of url components ( path , query , hash , ...) 然后可选地其余的url组件（ path ， query ， hash ，...）

For reference a list of valid and invalid url's according to my requirements 根据我的要求，参考一个有效和无效网址列表

VALID 有效

stackoverflow.com stackoverflow.com
stackoverflow.com/questions/ask stackoverflow.com/questions/ask
https://stackoverflow.com/questions/ask https://stackoverflow.com/questions/ask
http://www.amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155 http://www.amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155
amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155 amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155
http://test-site.com (filter_var reject this!!! I have domain names with dashes ) http://test-site.com （filter_var拒绝这个!!!我有破折号的域名）

INVALID 无效

http://www (php filter_var allow this, yes i know is a valid url) http：// www （php filter_var允许这个，是的，我知道是一个valid网址）
google 谷歌
http://www..des (php filter_var allow this) http：//www..des （php filter_var允许这个）
Any url with not allowed characters in the domain name 域名中包含不允许使用任何字符的网址

For completeness here is my php version: 5.3.2-1ubuntu4.2 为了完整性，这里是我的PHP版本： 5.3.2-1ubuntu4.2

Answer 1

As a starting point you can use this one, it's for JS , but it's easy to convert it to work for PHP preg_match . 作为起点你可以使用这个， 它适用于JS ，但很容易将它转换为PHP preg_match 。

/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$/i

For PHP should work this one: 对于PHP应该工作这一个：

$reg = '@^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$@i';

This regexp anyway validates only the domain part , but you can work on this or split the url at the 1st slash '/' (after "://" ) and validate separately the domain part and the rest. 这个正则表达式无论如何只验证域部分 ，但你可以处理这个或者在第一个斜杠'/' （在"://" ）拆分URL并分别验证域部分和其余部分。

BTW: It would validate also "http://www.domain.com.com" but this is not an error because a subdomain url could be like: "http://www.subdomain.domain.com" and it's valid! 顺便说一句：它也会验证"http://www.domain.com.com"但这不是错误，因为子域名网址可能是： "http://www.subdomain.domain.com" ，它是有效的！ And there is almost no way (or at least no operatively easy way) to validate for proper domain tld with a regex because you would have to write inline into your regex all possible domain tlds ONE BY ONE like this: 并且几乎没有办法（或者至少没有操作简单的方法）使用正则表达式验证正确的域tld，因为你必须在内核中写入所有可能的域tlds ONE by ONE，如下所示：

/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+(com|it|net|uk|de)$/i

(this last one for instance would validate only domain ending with .com/.net/.de/.it/.co.uk). （例如，最后一个将仅验证以.com / .net / .de / .it / .co.uk结尾的域）。 New tlds always come out , so you would have to adjust you regex everytimne a new tld comes out, that's a pain in the neck! 新的tld 总是出来 ，所以你必须调整你的正则表达式每一个新的tld出来，这是一个痛苦的脖子！

Answer 2

You could use parse_url to break up the address into its components. 您可以使用parse_url将地址分解为其组件。 While it's explicitly not built to validate a URL, analyzing the resulting components and matching them against your requirements would at least be a start. 虽然它显然不是为验证URL而构建的，但分析生成的组件并将它们与您的要求相匹配至少是一个开始。

Answer 3

It may vary but in most of the cases you don't really need to check the validity of any URL. 它可能会有所不同，但在大多数情况下，您并不需要检查任何URL的有效性。

If it's a vital information and you trust your user enough to let him give it through a URL, you can trust him enough to give a valid URL. 如果这是一个至关重要的信息，并且您信任您的用户足以让他通过URL提供，您可以信任他足以提供有效的URL。

If it isn't a vital information, then you just have to check for XSS attempts and display the URL that the user wanted. 如果它不是重要信息，那么您只需检查XSS尝试并显示用户想要的URL。

You can add manually a "http://" if you don't detect one to avoid navigation problems. 如果您没有检测到“http：//”，可以手动添加“http：//”以避免导航问题。

I know, I don't give you an alternative as a solution, but maybe the best way to solve performance & validity problems is just to avoid unnecessary checks. 我知道，我不会给你一个替代解决方案，但也许解决性能和有效性问题的最佳方法就是避免不必要的检查。

用于url验证的PHP正则表达式，filter_var太过分了

问题描述

VALID 有效

INVALID 无效

3 个解决方案

解决方案1
3 已采纳 2010-09-06 20:25:35

解决方案2
0 2010-09-06 19:31:49

解决方案3
0 2010-09-06 19:32:18

用于url验证的PHP正则表达式，filter_var太过分了

问题描述

VALID 有效

INVALID 无效

3 个解决方案

解决方案1 3 已采纳 2010-09-06 20:25:35

解决方案2 0 2010-09-06 19:31:49

解决方案3 0 2010-09-06 19:32:18

解决方案1
3 已采纳 2010-09-06 20:25:35

解决方案2
0 2010-09-06 19:31:49

解决方案3
0 2010-09-06 19:32:18