[英]PHP URL validation
I know there are an infinite number of threads asking this question, but I have not been able to find one that can help me with this. 我知道有无数个线程在问这个问题,但是我一直找不到能够帮助我解决这个问题的线程。
I am basically trying to parse a list of around 10,000,000 URLs, make sure they are valid per the following criteria and then get the root domain URL. 我基本上是在尝试解析大约10,000,000个URL的列表,确保它们按照以下条件有效,然后获取根域URL。 This list contains just about everything you can imagine, including stuff like (and the expected formatted url):
该列表几乎包含了您可以想象的所有内容,包括类似的东西(以及预期的格式化网址):
biy.ly/test [VALID] [return - bit.ly]
example.com/apples?test=1&id=4 [VALID] [return - example.com]
host101.wow404.apples.test.com/cert/blah [VALID] [return - test.com]
101.121.44.xxx [**inVALID**] [return false]
localhost/noway [**inVALID**] [return false]
www.awesome.com [VALID] [return - awesome.com]
i am so awesome [**inVALID**] [return false]
http://404.mynewsite.com/visits/page/view/1/ [VALID] [return - mynewsite.com]
www1.151.com/searchresults [VALID] [return - 151.com]
Does any one have any suggestions for this? 有人对此有任何建议吗?
^(?:https?://)?(?:[a-z0-9-]+\.)*((?:[a-z0-9-]+\.)[a-z]+)
Explanation 说明
^ # start-of-line
(?: # begin non-capturing group
https? # "http" or "https"
:// # "://"
)? # end non-capturing group, make optional
(?: # start non-capturing group
[a-z0-9-]+\. # a name part (numbers, ASCII letters, dashes) & a dot
)* # end non-capturing group, match as often as possible
( # begin group 1 (this will be the domain name)
(?: # start non-capturing group
[a-z0-9-]+\. # a name part, same as above
) # end non-capturing group
[a-z]+ # the TLD
) # end group 1
http://rubular.com/r/g6s9bQpNnC http://rubular.com/r/g6s9bQpNnC
I would start with the default: 我将从默认值开始:
filter_var($inputUrl, FILTER_VALIDATE_URL);
Then add your special cases of things that are not acceptable for further validation. 然后添加您无法进一步验证的特殊情况。 This should simplify a bit.
这应该简化一点。
As for getting the host. 至于获得主持人。
parse_url($inputUrl, PHP_URL_HOST);
^(([a-zA-Z](\\.[a-zA-Z])+)|([0-9]{1,3}(\\.[0-9]{1,3}){3})/.*$
edit 编辑
In php that would be preg_match ( '^(([a-zA-Z](\\.[a-zA-Z])+)|([0-9]{1,3}(\\.[0-9]{1,3}){3})/.*$' , $myUrls , $matches)
在php中将是
preg_match ( '^(([a-zA-Z](\\.[a-zA-Z])+)|([0-9]{1,3}(\\.[0-9]{1,3}){3})/.*$' , $myUrls , $matches)
What you need would be in $matches[1]
您需要的是
$matches[1]
$website = test_input($_POST["website"]);
if (!preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i",$w$website = test_input($_POST["website"]);
if (!preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i",$website))
{
$websiteErr = "Invalid URL";
}ebsite))
{
$websiteErr = "Invalid URL";
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.