[英]Cannot write UTF-16LE encoded CSV file with Text::CSV_XS Perl module
I want to write a CSV file encoded in UTF-16LE. 我想编写一个以UTF-16LE编码的CSV文件。 However, the output in the file gets messed up. 但是,文件中的输出混乱了。 There are strange chinese looking letters: 挀攀氀氀⸀㬀挀攀氀氀⸀㈀㬀ഀ. 有一些奇怪的中文字母:挀攀氀氀⸀㬀挀攀氀氀⸀㈀㬀ഀ。
This looks like off-by-one-byte problem mentioned here: Creating UTF-16 newline characters in Python for Windows Notepad 这看起来像是这里提到的逐字节问题: 在Windows记事本中的Python中创建UTF-16换行符
Other threads about Perl and Text::CSV_XS didn't help. 有关Perl和Text :: CSV_XS的其他线程没有帮助。
This is how I try it: 这是我尝试的方法:
#!perl
use strict;
use warnings;
use utf8;
use Text::CSV_XS;
binmode STDOUT, ":utf8";
my $csv = Text::CSV_XS->new({
binary => 1,
sep_char => ";",
quote_char => undef,
eol => $/,
});
open my $in, '<:encoding(UTF-16LE)', 'in.csv' or die "in.csv: $!";
open my $out, '>:encoding(UTF-16LE)', 'out.csv' or die "out.csv: $!";
while (my $row = $csv->getline($in)) {
$_ =~ s/ä/æ/ for @$row; # something will be done to the data...
$csv->print($out, $row);
}
close $in;
close $out;
in.csv contains some test data and it is encoded in UTF-16LE: in.csv包含一些测试数据,并以UTF-16LE进行编码:
header1;header2;
cell1.1;cell1.2;
äöü2.1;ab"c2.2;
The results looks like this: 结果看起来像这样:
header1;header2;挀攀氀氀⸀㬀挀攀氀氀⸀㈀㬀ഀ
æöü2.1;abc2.2;
It is not an option to switch to UTF-8 as output format (which works fine btw). 不能选择将UTF-8作为输出格式(顺便说一句)。
So, how do I write valid UTF-16LE encoded CSV files using Text::CSV_XS? 那么,如何使用Text :: CSV_XS编写有效的UTF-16LE编码的CSV文件?
Perl adds :crlf
by default on Windows. 在Windows上,Perl默认添加:crlf
。 It's added first, before your :encoding
is added. 首先添加它,然后再添加:encoding
。
That means LF⇔CRLF conversion will be performed before decoding on reads, and after encoding on writes. 这意味着将在读取解码之前和写入编码之后执行LF⇔CRLF转换。 This is backwards. 这是倒退。
It ends up working with UTF-8 despite being done backwards because all of the following conditions are met: 尽管已经完成了向后的操作,但最终还是使用UTF-8,因为满足以下所有条件:
None of those conditions holds true for UTF-16le. 这些条件都不适用于UTF-16le。
Fix: 固定:
open(my $fh_in, '<:raw:encoding(UTF-16LE):crlf', $qfn_in)
open(my $fh_out, '>:raw:encoding(UTF-16LE):crlf', $qfn_out)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.