简体   繁体   中英

utf-8 character input fail to PHP regex

  <?php
        if(isset($_GET['textvalue'])){
            $string = $_GET['textvalue']; //preg_match return false
            //$string = '한자漢字メ'; //preg_match return true
            $stringArray = preg_match('/^[\p{L}]{2,30}$/u', $string);
        }

    ?>



<!DOCTYPE html>
<html>
    <body>
        <form method="GET">
            <input type="text" name="textvalue">
            <input type="submit">
        </form>
    </body>
</html>

I'm trying to regex the value from the input.
Unfortunately, every time I submit the characters, preg_match return false . But, if I use the string from the variable, it'll return true .

What going on and how do I fix it?

If anyone ran into this problem, I've found it. You just need to add this meta header:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

I'm not sure why, but with out the codes above, html it send the values to php as a non-utf-8 value. So, then the preg_match try to read it, its reading a different value then what was typed in, thus; it return false.

That's why it work when you just uses the string. HTml is not involved.

note. Even if you try to read by echoing it out, html with return it to its orginal utf-8 value. weird.

Example:

<?php
if(isset($_GET['textvalue'])){
    $string = $_GET['textvalue']; //preg_match return false
    //$string = '한자漢字メ'; //preg_match return true
    $stringArray = preg_match('/^[\p{L}]{2,30}$/u', $string);
}    
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <head>
    <body>
        <form method="GET">
            <input type="text" name="textvalue">
            <input type="submit">
        </form>
    </body>
</html>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM