[英]PHP regex for url validation, filter_var is too permisive
First lets define a "URL" according to my requirements. 首先,根据我的要求定义“URL”。
The only protocols optionally allowed are http://
and https://
唯一允许的协议是
http://
和https://
then a mandatory domain name like stackoverflow.com
然后是像
stackoverflow.com
这样的强制域名
then optionally the rest of url components ( path
, query
, hash
, ...) 然后可选地其余的url组件(
path
, query
, hash
,...)
For reference a list of valid and invalid url's according to my requirements 根据我的要求,参考一个有效和无效网址列表
amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155 amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155
http://test-site.com (filter_var reject this!!! I have domain names with dashes ) http://test-site.com (filter_var拒绝这个!!!我有破折号的域名)
valid
url) valid
网址) For completeness here is my php version: 5.3.2-1ubuntu4.2
为了完整性,这里是我的PHP版本:
5.3.2-1ubuntu4.2
As a starting point you can use this one, it's for JS , but it's easy to convert it to work for PHP preg_match
. 作为起点你可以使用这个, 它适用于JS ,但很容易将它转换为PHP
preg_match
。
/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$/i
For PHP should work this one: 对于PHP应该工作这一个:
$reg = '@^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$@i';
This regexp anyway validates only the domain part , but you can work on this or split the url at the 1st slash '/'
(after "://"
) and validate separately the domain part and the rest. 这个正则表达式无论如何只验证域部分 ,但你可以处理这个或者在第一个斜杠
'/'
(在"://"
)拆分URL并分别验证域部分和其余部分。
BTW: It would validate also "http://www.domain.com.com"
but this is not an error because a subdomain url could be like: "http://www.subdomain.domain.com"
and it's valid! 顺便说一句:它也会验证
"http://www.domain.com.com"
但这不是错误,因为子域名网址可能是: "http://www.subdomain.domain.com"
,它是有效的! And there is almost no way (or at least no operatively easy way) to validate for proper domain tld with a regex because you would have to write inline into your regex all possible domain tlds ONE BY ONE like this: 并且几乎没有办法(或者至少没有操作简单的方法)使用正则表达式验证正确的域tld,因为你必须在内核中写入所有可能的域tlds ONE by ONE,如下所示:
/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+(com|it|net|uk|de)$/i
(this last one for instance would validate only domain ending with .com/.net/.de/.it/.co.uk). (例如,最后一个将仅验证以.com / .net / .de / .it / .co.uk结尾的域)。 New tlds always come out , so you would have to adjust you regex everytimne a new tld comes out, that's a pain in the neck!
新的tld 总是出来 ,所以你必须调整你的正则表达式每一个新的tld出来,这是一个痛苦的脖子!
It may vary but in most of the cases you don't really need to check the validity of any URL. 它可能会有所不同,但在大多数情况下,您并不需要检查任何URL的有效性。
If it's a vital information and you trust your user enough to let him give it through a URL, you can trust him enough to give a valid URL. 如果这是一个至关重要的信息,并且您信任您的用户足以让他通过URL提供,您可以信任他足以提供有效的URL。
If it isn't a vital information, then you just have to check for XSS attempts and display the URL that the user wanted. 如果它不是重要信息,那么您只需检查XSS尝试并显示用户想要的URL。
You can add manually a "http://" if you don't detect one to avoid navigation problems. 如果您没有检测到“http://”,可以手动添加“http://”以避免导航问题。
I know, I don't give you an alternative as a solution, but maybe the best way to solve performance & validity problems is just to avoid unnecessary checks. 我知道,我不会给你一个替代解决方案,但也许解决性能和有效性问题的最佳方法就是避免不必要的检查。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.