Problem with UTF-16LE characters reading csv with php

Question

I have a php script that reads a csv file (it has UTF-16LE encoding). The problem is that at some lines the array of php reading the lines of the csv is collapsed because of some Greek characters. A example is bellow (there are 7 elements at the array and the bellow has only 2), how can I solve this problem?

Array ( [0] => 205198 [1] => Label 4.2 Βάση για Σ▒ )

My code is bellow

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = preg_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

[edit]I used the code below

$array = str_getcsv($array, "\n");
        foreach ($array as &$Row) {
            $Row = str_getcsv($Row, ";");          
        }

Answer 1

My best bet is that :

You need mb_split , since you are messing with multibyte strings to support GR lang.

Some theory :

UTF-8, with the famous ASCII = 1 byte.

UTF-16 with all unicode characters support = 4 bytes.

Some action :

"mb_split — Split multibyte string using regular expression" : PHP : mb_split

There are also similar functions as mb_ereg_replace .

Example :

$array = file_get_contents($this->listUrl);      
         $array = mb_convert_encoding($array, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
         $array = mb_split("/\R/", $array);                        // Split it by line breaks       
         $array = array_map(function ($v) {
             return str_getcsv($v, ";");
         }, $array);

Have fun !

Problem with UTF-16LE characters reading csv with php

Question

1 answers

solution1
0 2022-07-04 12:03:08

Problem with UTF-16LE characters reading csv with php

Question

1 answers

solution1 0 2022-07-04 12:03:08

solution1
0 2022-07-04 12:03:08