PHP无法正确解析CSV（文件位于UTF-16LE中）

Question

I am trying to parse a CSV file using PHP. 我正在尝试使用PHP解析CSV文件。
The file uses commas as delimiter and double quotes for fields containing comma(s) , as: 该文件使用逗号作为定界符，并对包含逗号的字段使用双引号 ，例如：

foo,"bar, baz",foo2

The issue I am facing is that I get fields containing comma(s) separated. 我面临的问题是我将包含逗号的字段分隔开了。 I get: 我得到：

"2
rue du ..."

Instead of: 2, rue du ... . 代替： 2, rue du ...

Encoding: 编码方式：
The file doesn't seem to be in UTF8. 该文件似乎不在UTF8中。 It has weird wharacters at the beginning ( apparently not BOM , looks like this when converted from ASCII to UTF8: ÿþ ) and doesn't displays accents. 它在开始时具有怪异的特征（显然不是BOM ，当从ASCII转换为UTF8： ÿþ时看起来像这样），并且不显示任何重音符号。

My code editor (Atom) tells the encoding is UTF-16 LE 我的代码编辑器（Atom）告诉编码为UTF-16 LE
using mb_detect_encoding() on the csv lines it returns ASCII 在csv行上使用mb_detect_encoding()返回ASCII码

But it fails to convert: 但是它无法转换：

mb_convert_encoding() converts from ASCII but returns asian characters from UTF-16LE mb_convert_encoding()从ASCII转换，但从UTF-16LE返回亚洲字符
iconv() returns Notice: iconv(): Wrong charset, conversion from UTF-16LE / ASCII to UTF8 is not allowed . iconv()返回注意：iconv（）：错误的字符集，不允许从UTF-16LE / ASCII转换为UTF8 。

Parsing: 解析：
I tried to parse with this one-liner (see those 2 comments ) using str_getcsv() : 我试图使用str_getcsv()来解析这种单行代码（请参阅这2条评论 str_getcsv() ：

$csv = array_map('str_getcsv', file($file['tmp_name']));

I then tried with fgetcsv() : 然后，我尝试使用fgetcsv() ：

$f = fopen($file['tmp_name'], 'r');
while (($l = fgetcsv($f)) !== false) {
    $arr[] = $l;
}
$f = fclose($f);

In both ways I get my adress field in 2 parts. 通过两种方式，我将获得2个部分的地址字段。 But when I try this code sample I get correctly parsed fields: 但是当我尝试此代码示例时，我得到了正确解析的字段：

$str = 'foo,"bar, baz",foo2,azerty,"ban, bal",doe';
$data = str_getcsv($str);
echo '<pre>' . print_r($data, true) . '</pre>';

To sum up with questions: 总结问题：

What are the characters at the beginning of the file ? 文件开头的字符是什么？
How could I be sure about the encoding ? 我如何确定编码？ (Atom reads the file with UTF-16 LE and doesn't display weird characters at the beginning) （Atom使用UTF-16 LE读取文件，并且开头不显示奇怪的字符）
What makes the csv parsing functions fail ? 是什么使csv解析功能失败？
If I should rely on something else to parse the lines of the CSV, what could I use ? 如果我应该依靠其他方法来解析CSV的行，那我可以使用什么呢？

Answer 1

I finally solved it myself: 我终于自己解决了：

I sent the file into online encoding detection websites which returned UTF16LE . 我将该文件发送到了返回UTF16LE的在线编码检测网站。 After checking about what is UTF16LE it says it has BOM (Byte Order Mark) . 在检查了什么是UTF16LE之后，它说它具有BOM（字节顺序标记） 。
My previous attempts were using file() which returns an array of the lines of a file and with fopen() which returns a resource, but we still parse line by line . 我以前的尝试是使用file()返回文件行的数组，以及使用fopen()返回资源，但是我们仍然逐行解析。

The working solution came in my mind about converting the whole file (every line at once) instead of converting each line separately. 我想到的工作解决方案是转换整个文件（一次一行），而不是分别转换每一行。 Here is a working solution: 这是一个可行的解决方案：

$f = file_get_contents($file['tmp_name']);          // Get the whole file as string
$f = mb_convert_encoding($f, 'UTF8', 'UTF-16LE');   // Convert the file to UTF8
$f = preg_split("/\R/", $f);                        // Split it by line breaks
$f = array_map('str_getcsv', $f);                   // Parse lines as CSV data

I don't get the adress fields separated at internal commas anymore. 我不再在内部逗号之间分开地址字段。

PHP无法正确解析CSV（文件位于UTF-16LE中）

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-10-01 11:41:07

PHP无法正确解析CSV（文件位于UTF-16LE中）

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-10-01 11:41:07

解决方案1
2 已采纳 2018-10-01 11:41:07