简体   繁体   中英

URL Validation in PHP

The topic has been discussed a lot here at StackOverflow but all the answers I managed to explore fail to produce the results I need. I want to check before inserting the URL into database that the value is actually a URL. The default function of PHP FILTER_VALIDATE_URL returns true even if we just provide httpp://exampl

but I need to validate the value only if it is a true domain like example.net, example.com etc.. Let's try an example:

Case 1:

$url = "http://example";
if(!filter_var($url, FILTER_VALIDATE_URL) === false) {
                return true;
            }

This above returns true but domain isn't valid.

Case 2:

$url = "http://google.com";
if(!filter_var($url, FILTER_VALIDATE_URL) === false) {
                return true;
            }

Returns true and that's okay.

But any possible solution for case 1? Please help.

PS: I used CURL and it works but the response is too slow (more than 5 seconds). Any solid solution will be greatly appreciated.

I've coded a quick script that may help you achieving what you need :

<?php
//error_reporting(E_ALL);
//ini_set('display_errors', 1);
$url = "http://www.google.com";


if(validateUrl($url)){
    echo "VALID";
}else{
    echo "INVALID";
}

function validateUrl($url){

//first we validate the url using a regex

if (!preg_match('%^(?:(?:https?)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$%uiS', $url)) {

    return false;
}


//if the url is valid, we "curl it" and expect to get a 200 header response in order to validate it.

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);    // we want headers
curl_setopt($ch, CURLOPT_NOBODY, true);    // we don't need body (faster)
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); // we follow redirections
curl_setopt($ch, CURLOPT_TIMEOUT,10);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);


if($httpcode == "200"){
    return true;
}else{
    return false;
}


}

http://example is a valid url - if you have a computer called example on your local network.

The only solution for what you want (especially considering that there are lots of new top level domains) is to connect and see if you get 200 OK.

CURL is probably the best solution here.

This superuser question might help to just get the response code from a url.

However you will never get 100% accuracy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM