简体   繁体   English

用于url验证的PHP正则表达式,filter_var太过分了

[英]PHP regex for url validation, filter_var is too permisive

First lets define a "URL" according to my requirements. 首先,根据我的要求定义“URL”。

The only protocols optionally allowed are http:// and https:// 唯一允许的协议是http://https://

then a mandatory domain name like stackoverflow.com 然后是像stackoverflow.com这样的强制域名

then optionally the rest of url components ( path , query , hash , ...) 然后可选地其余的url组件( pathqueryhash ,...)

For reference a list of valid and invalid url's according to my requirements 根据我的要求,参考一个有效和无效网址列表

VALID 有效

INVALID 无效

  • http://www (php filter_var allow this, yes i know is a valid url) http:// www (php filter_var允许这个,是的,我知道是一个valid网址)
  • google 谷歌
  • http://www..des (php filter_var allow this) http://www..des (php filter_var允许这个)
  • Any url with not allowed characters in the domain name 域名中包含不允许使用任何字符的网址

For completeness here is my php version: 5.3.2-1ubuntu4.2 为了完整性,这里是我的PHP版本: 5.3.2-1ubuntu4.2

As a starting point you can use this one, it's for JS , but it's easy to convert it to work for PHP preg_match . 作为起点你可以使用这个, 它适用于JS ,但很容易将它转换为PHP preg_match

/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$/i

For PHP should work this one: 对于PHP应该工作这一个:

$reg = '@^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$@i';

This regexp anyway validates only the domain part , but you can work on this or split the url at the 1st slash '/' (after "://" ) and validate separately the domain part and the rest. 这个正则表达式无论如何只验证域部分 ,但你可以处理这个或者在第一个斜杠'/' (在"://" )拆分URL并分别验证域部分和其余部分。

BTW: It would validate also "http://www.domain.com.com" but this is not an error because a subdomain url could be like: "http://www.subdomain.domain.com" and it's valid! 顺便说一句:它也会验证"http://www.domain.com.com"但这不是错误,因为子域名网址可能是: "http://www.subdomain.domain.com" ,它是有效的! And there is almost no way (or at least no operatively easy way) to validate for proper domain tld with a regex because you would have to write inline into your regex all possible domain tlds ONE BY ONE like this: 并且几乎没有办法(或者至少没有操作简单的方法)使用正则表达式验证正确的域tld,因为你必须在内核中写入所有可能的域tlds ONE by ONE,如下所示:

/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+(com|it|net|uk|de)$/i

(this last one for instance would validate only domain ending with .com/.net/.de/.it/.co.uk). (例如,最后一个将仅验证以.com / .net / .de / .it / .co.uk结尾的域)。 New tlds always come out , so you would have to adjust you regex everytimne a new tld comes out, that's a pain in the neck! 新的tld 总是出来 ,所以你必须调整你的正则表达式每一个新的tld出来,这是一个痛苦的脖子!

You could use parse_url to break up the address into its components. 您可以使用parse_url将地址分解为其组件。 While it's explicitly not built to validate a URL, analyzing the resulting components and matching them against your requirements would at least be a start. 虽然它显然不是为验证URL而构建的,但分析生成的组件并将它们与您的要求相匹配至少是一个开始。

It may vary but in most of the cases you don't really need to check the validity of any URL. 它可能会有所不同,但在大多数情况下,您并不需要检查任何URL的有效性。

If it's a vital information and you trust your user enough to let him give it through a URL, you can trust him enough to give a valid URL. 如果这是一个至关重要的信息,并且您信任您的用户足以让他通过URL提供,您可以信任他足以提供有效的URL。

If it isn't a vital information, then you just have to check for XSS attempts and display the URL that the user wanted. 如果它不是重要信息,那么您只需检查XSS尝试并显示用户想要的URL。

You can add manually a "http://" if you don't detect one to avoid navigation problems. 如果您没有检测到“http://”,可以手动添加“http://”以避免导航问题。


I know, I don't give you an alternative as a solution, but maybe the best way to solve performance & validity problems is just to avoid unnecessary checks. 我知道,我不会给你一个替代解决方案,但也许解决性能和有效性问题的最佳方法就是避免不必要的检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM