简体   繁体   中英

Going mad over encoding bugs with JSON and the differences between JS and PHP

I have a real mess at my hands with encoding-related bugs.

I have a DB with latin1 (which is close to Windows-1252 I believe), a user frontend page in Windows-1252, and an AJAX backend in Windows-1252. These can't be changed atm.

Yet, because JSON expects UTF8 data, I'm running into tons of trouble with German Umlaute.

I'm currently retrieving some escaped example data from the DB on the frontend [{"\ö\ä\ü\ß"}] and using

foreach($example_array_of_objects as $k => &$v) {
    foreach($v as $k2 => $v2) {
        $v[$k2] = utf8_decode($v2);
    }
}

which results in correct display of the data in input form fields on the frontend.

However, this is where I'm stuck. PHP's json_encode escapes Umlaute to these \\u sequences, but in Javascript, JSON.stringify just doesn't. When I JSON.stringify the input field data and send it to the AJAX script, I get only garbage from a print_r response:

öäüß

encodeURIComponent doesn't do the same type of escaping as PHP does. This is infuriating.

How can I transform "öäüß" to \ö\ä\ü\ß in JS (or how can I synchronize the way data is handled between JS/PHP/mySQL somehow)?

You can not really modify how JSON.stringify works - providing a replacer function as the 2nd argument will force you to manually encode values (unpleasant thing). Your best bet is to use UTF-8 in the frontend (JavaScript code) and convert from/to CP1252 only in your PHP code.

When sending data to the frontend you should use these flags

json_encode($array,  JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES | JSON_NUMERIC_CHECK);

which will ensure the cleanest possible UTF-8 output.

To populate your $array you should use mb_convert_encoding($original_data_from_DB, 'UTF-8', 'CP1252') and to get your data after json_decode you should use mb_convert_encoding($data_from_java_script, 'CP1252', 'UTF-8')

Faced this type of issue once but not with PHP and it was solved using encodeURIComponent. If encodeURIComponent is not working for you try using a Base64 encoding decoding on both the sides using atob and btoa.

I managed to do it by letting PHP handle the majority of it now:

JS sends to AJAX:

mydata = JSON.stringify(data);

AJAX Backend:

//decode JS way of JSON with additional slashes etc.
$tmp = json_decode(stripslashes(html_entity_decode($_POST['mydata'])), true);
//re-encode the PHP way
$json = json_encode($tmp);
//save to DB
[...]

User Frontend (Form):

//Retrieval from DB
$mydata = json_decode($db_row['mydata'], true);
//loop through, replace " with " for input fields, decode utf8
foreach($mydata as $k => &$v) {
    foreach($v as $k2 => $v2) {
        $v[$k2] = utf8_decode(preg_replace('~"~', '"', $v2));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM