简体   繁体   中英

PHP filter_var URL

For validating a URL path from user input, i'm using the PHP filter_var function. The input only contains the path (/path/path/script.php).

When validating the path, I add the host. I'm playing around a little bit, testing the input validation etc. Doing so, i notice a strange(??) behavior of the filter URL function.

Code:

$url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
var_dump(filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED)); //valid

Can someone explane why this is a valid URL? Thanks!

The short answer is, PHP FILTER_VALIDATE_URL checks the URL only against RFC 2396 and your URL, although weird, is valid according to said standard.

Long answer:

The filter you are using is declared to be compliant with RFC, so let's check that standard ( RFC 2396 ).

The regular expression used for parsing a URL and listed there is:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
 12            3  4          5       6  7        8 9

Where:

scheme    = $2
authority = $4
path      = $5
query     = $7
fragment  = $9

As we can see, the ":" character is reserved only in the context of scheme and from that point onwards ":" is fair game (this is supported by the text of the standard). For example, it is used freely in the http: scheme to denote a port. A slash can also appear in any place and nothing prohibits the URL from having a "//" somewhere in the middle. So "http://" in the middle should be valid.

Let's look at your URL and try to match it to this regexp:

$url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
//Escaped a couple slashes to make things work, still the same regexp
$result_rfc = preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/',$url);
echo '<p>'.$result_rfc.'</p>';

The test returns '1' so this url is valid. This is to be expected, as the rules don't declare urls that have something like 'http://' in the middle to be invalid as we have seen. PHP simply mirrors this behaviour with FILTER_VALIDATE_URL.

If you want a more rigurous test, you will need to write the required code yourself. For example, you can prevent "://" from appearing more than once:

$url = "http://www.domain.nl/http://www.google.nl/modules/authorize/test/normal.php";
$result_rfc = preg_match('/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/',$url);
if (substr_count($url,'://') != 1) {
    $result_non_rfc = false;
} else {
    $result_non_rfc = $result_rfc;
}

You can also try and adjust the regular expression itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM