简体   繁体   English

使用filter_var验证带和不带协议的URL

[英]Validating URL with and without protocol with filter_var

I am attempting to validate using PHP's filter_var() extension. 我试图使用PHP的filter_var()扩展来验证。 Per http://php.net/manual/en/filter.filters.validate.php : 根据http://php.net/manual/en/filter.filters.validate.php

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396 ), optionally with required components. 将值验证为URL(根据» http://www.faqs.org/rfcs/rfc2396 ),可选地使用必需的组件。 Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, eg ssh:// or mailto:. 请注意,有效的URL可能未指定HTTP协议http://因此可能需要进一步验证以确定URL使用预期的协议,例如ssh://或mailto:。 Note that the function will only find ASCII URLs to be valid; 请注意,该函数只能找到有效的ASCII URL; internationalized domain names (containing non-ASCII characters) will fail. 国际化域名(包含非ASCII字符)将失败。

In regards to, Beware a valid URL may not specify the HTTP protocol , my tests below indicate that a HTTP protocol is required ( URL 'stackoverflow.com/' is NOT considered valid. ). 关于, 请注意,有效的URL可能未指定HTTP协议 ,我的测试表明需要HTTP协议( URL 'stackoverflow.com/' is NOT considered valid. )。 How am I misinterpreting the documentation? 我怎么会误解文档?

Also, how are URLs such as https://https://stackoverflow.com/ prevented from validating true? 此外,如何阻止https:// https://stackoverflow.com/等URL验证是否为true?

PS. PS。 Any comments on my approach of sanitizing the protocol would be appreciated. 对我对消毒协议进行消毒的方法提出任何意见将不胜感激。

<?php
function filterURL($url) {
    echo("URL '{$url}' is ".(filter_var($url, FILTER_VALIDATE_URL)?'':' NOT ').'considered valid.<br>');
}
function sanitizeURL($url) {
    return (strtolower(substr($url,0,7))=='http://' || strtolower(substr($url,0,8))=='https://')?$url:'http://'.$url;
}

filterURL('http://stackoverflow.com/');
filterURL('https://stackoverflow.com/');
filterURL('//stackoverflow.com/');
filterURL('stackoverflow.com/');
filterURL(sanitizeURL('http://stackoverflow.com/'));
filterURL(sanitizeURL('https://stackoverflow.com/'));
filterURL(sanitizeURL('stackoverflow.com/'));

filterURL('https://https://stackoverflow.com/');
?>

OUTPUT: OUTPUT:

URL 'http://stackoverflow.com/' is considered valid.
URL 'https://stackoverflow.com/' is considered valid.
URL '//stackoverflow.com/' is NOT considered valid.
URL 'stackoverflow.com/' is NOT considered valid.
URL 'http://stackoverflow.com/' is considered valid.
URL 'https://stackoverflow.com/' is considered valid.
URL 'http://stackoverflow.com/' is considered valid.
URL 'https://https://stackoverflow.com/' is considered valid.

FILTER_VALIDATE_URL uses parse_url() , which unfortunatelly parses 'https://https://' as a valid URL (as it is really a valid one considering URIs RFC): FILTER_VALIDATE_URL使用parse_url() ,它不幸地将'https://https://'解析为有效的URL(因为它实际上是一个考虑URIs RFC的有效URL):

var_dump(parse_url('https://https://stackoverflow.com/'));

array(3) { 
  ["scheme"]=> string(5) "https" 
  ["host"]=> string(5) "https"
  ["path"]=> string(20) "//stackoverflow.com/" 
}

You could change your sanitazeURL function into: 您可以将sanitazeURL功能更改为:

function sanitizeURL($url) {
  return (parse_url($url, PHP_URL_SCHEME)) ? $url : 'http://' . $url;
}

but still you have to check whether host name is not http nor https : 但是你还是要检查主机名是不是http还是https

function filterURL($url) {
  echo("URL '{$url}' is ".((filter_var($url, FILTER_VALIDATE_URL) !== false && (parse_url($url, PHP_URL_HOST) !== 'http' && parse_url($url, PHP_URL_HOST) !== 'https'))?'':' NOT ').'considered valid.<br>');
}

You can remove the http or add it by validation it exist or not. 您可以删除http或通过验证是否存在来添加它。

<?php
$url = "http://www.nigeriatest.com";

// Remove all illegal characters from a url
$url = filter_var($url, FILTER_SANITIZE_URL);

// Validate url
if (!filter_var($url, FILTER_VALIDATE_URL) === false) {
    echo("$url is a valid URL");
} else {
    echo("$url is not a valid URL");
}
?>

How am I misinterpreting the documentation? 我怎么会误解文档?

The specification doesn't say anything about not having a protocol - it simply states that the protocol might not be HTTP. 该规范没有说明没有协议 - 它只是声明协议可能不是HTTP。

You chop of the important piece of the sentence in your quote... 你在报价中剁掉了句子的重要部分......

Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol 请注意,有效的URL可能未指定HTTP协议http:// 因此可能需要进一步验证以确定URL使用预期的协议

A protocol is expected , it may or may not be HTTP. 期望协议,它可能也可能不是 HTTP。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM