简体   繁体   中英

Validate non-latin URLs

My client asks to validate URLs entered by users which contains non-latin characters. Example:

  • http://uk.wikipedia.org/wiki/Фотосинтез
  • http://презитент.рф

Does anyone has a regexp to validate such an URLs?

Or is there a way in PHP to easily URL-encode non-Latin URL part. Ex: http://uk.wikipedia.org/wiki/Фотосинтез -> http://uk.wikipedia.org/wiki/%D0%A4%D0%BE%D1%82%D0%BE%D1%81%D0%B8%D0%BD%D1%82%D0%B5%D0%B7 and vice verse?

Does it make any sense?

Many thanks for help.

php.net warns that parse_url "is not meant to validate the given URL, it only breaks it up into the above listed parts." If that's acceptable, it appears to (more or less) work with non-Latin characters:

~ visitor$ cat parse.php 
<?php
$parsed = parse_url( 'http://uk.wikipedia.org/wiki/Фотосинтез' );
print_r( $parsed );
?>

~ visitor$ php parse.php 
Array
(
    [scheme] => http
    [host] => uk.wikipedia.org
    [path] => /wiki/Фо?_о?_ин?_ез
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM