简体   繁体   中英

Problem with UTF-16LE characters reading csv with php

I have a php script that reads a csv file (it has UTF-16LE encoding). The problem is that at some lines the array of php reading the lines of the csv is collapsed because of some Greek characters. A example is bellow (there are 7 elements at the array and the bellow has only 2), how can I solve this problem?

Array ( [0] => 205198 [1] => Label 4.2 Βάση για Σ▒ )

My code is bellow

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = preg_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

[edit]I used the code below

$array = str_getcsv($array, "\n");
        foreach ($array as &$Row) {
            $Row = str_getcsv($Row, ";");          
        }

My best bet is that :

You need mb_split , since you are messing with multibyte strings to support GR lang.

Some theory :

UTF-8, with the famous ASCII = 1 byte.

UTF-16 with all unicode characters support = 4 bytes.

Some action :

"mb_split — Split multibyte string using regular expression" : PHP : mb_split

There are also similar functions as mb_ereg_replace .

Example :

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = mb_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

Have fun !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM