简体   繁体   English

使用Perl从Excel电子表格中读取法语字符

[英]Using Perl to read French characters from an Excel spreadsheet

I am using Spreadsheet::ParseExcel to parse an Excel spreadsheet file as follows 我正在使用Spreadsheet::ParseExcel来解析Excel电子表格文件,如下所示

my $FileName = "../excel.xls";
my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($FileName);

And reading values from the cells like this 并从这样的细胞中读取值

$product = $worksheeto->get_cell( $row, 0 )->value();

The problem is that, when there is a French character, for instance à , it shows ò 问题是,当有一个法国字符,例如à ,它会显示ò

To be sure that the there is no error in the parsing I used 为了确保我使用的解析没有错误

print unpack('H*', $product) . "\n";

So when I use any online hex to string converter I do get the à . 因此,当我使用任何在线十六进制到字符串转换器时,我确实得到了à

I also tried 我也试过了

use utf8;
binmode(STDOUT, ":utf8");

but instead of à I get 但不是à我得到

Is there a way to get the correct characters? 有没有办法得到正确的字符?

Try parsing the file with a formatter, for example the Spreadsheet::ParseExcel::FmtUnicode : 尝试使用格式化程序解析文件,例如Spreadsheet :: ParseExcel :: FmtUnicode

use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::FmtUnicode;
#use Spreadsheet::ParseExcel::FmtJapan;

my $FileName = '../excel.xls';
my $parser   = Spreadsheet::ParseExcel->new();             
my $formatter = Spreadsheet::ParseExcel::FmtUnicode->new();
my $workbook = $parser->parse($FileName,$formatter);

Try also the FmtJapan, since the documentation says : The Spreadsheet::ParseExcel::FmtJapan formatter also supports Unicode. 尝试FmtJapan,因为文档说Spreadsheet :: ParseExcel :: FmtJapan格式化程序也支持Unicode。 If you encounter any encoding problems with the default formatter try that instead. 如果您遇到默认格式化程序的任何编码问题,请尝试相反。

*UPDATE: I tried it by myself in a xls file with Greek characters but it didn't worked neither with FmtUnicode or FmtJapan . *更新:我自己在带有希腊字符的xls文件中尝试过,但它对FmtUnicode或FmtJapan都不起作用。 I then found this perlmonks post , used the provided My::Excel::FmtUTF8 module and worked successfully when printing the values of a cell with $cell->value() . 然后我发现这个perlmonks帖子 ,使用提供的My::Excel::FmtUTF8模块,并在使用$cell->value()打印单元格的值时成功运行。

I have tried what you describe and this works correctly here, once I enable the utf-8 output. 一旦启用utf-8输出,我已经尝试了你所描述的并且这在这里正常工作。 I would guess you either have a weird excel file (you should post an example somewhere), or that your terminal is badly configured. 我猜你要么有一个奇怪的excel文件(你应该在某处发布一个例子),或者你的终端配置不当。

Dealing with character set issues is hard, because your terminal can me confusing you. 处理字符集问题很难,因为你的终端让我感到困惑。 So it is always a good idea to pipe the output into 'od -c' to see what you are getting. 所以将输出管道输入'od -c'来查看你得到的东西总是一个好主意。 In my script I get this text from a spreadsheet I had lying around: 在我的脚本中,我从我躺在的电子表格中获取此文本:

Value       = Descripción

And when I pipe it through od: 当我通过od管道时:

0000000   V   a   l   u   e                               =       D   e
0000020   s   c   r   i   p   c   i 303 263   n  \n

I can see that the ó is two bytes long, which suggests is UTF-8. 我可以看到ó是两个字节长,这表明是UTF-8。 To make sure, you can ask iconv to convert from the expected output charset to whatever you are using in your terminal: 为了确保,您可以要求iconv从预期的输出字符集转换为您在终端中使用的任何内容:

iconv -f utf-8

If the input is not proper utf-8 it will bark at you and/or output even weirder garbage. 如果输入不正确,utf-8它会吠叫你和/或甚至输出更怪异的垃圾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM