简体   繁体   English

使用Perl的Spreadsheet :: ParseExcel将Excel文件中的亚洲(日语/中文)字符提取为TSV格式

[英]Fetching Asian (Japanese / Chinese) characters from Excel file into TSV format using Perl's Spreadsheet::ParseExcel

Friends, I am preparing a TSV file from excel file, containing Chinese (special) characters as follows - The Seonjeongneung ... Jeonghyeon (貞顯王后, 1462–1530) ..... 朋友,我正在从excel文件中准备一个TSV文件,其中包含中文(特殊)字符,如下所示-The Seonjeongneung ... Jeonghyeon(贞显王后,1465-1530).....

I have tried using perl CPAN's Spreadsheet::ParseExcel and Spreadsheet::ParseExcel::FmtJapan. 我尝试使用perl CPAN的Spreadsheet :: ParseExcel和Spreadsheet :: ParseExcel :: FmtJapan。 But no success. 但是没有成功。 These characters are appearing as ?? 这些字符显示为?? in the TSV file, when opened in VIM. 在VIM中打开时,在TSV文件中。

I also tried " binmode STDOUT, ':utf8'; " and " binmode STDOUT, ':encoding(cp932)'; " 我还尝试了“ binmode STDOUT,':utf8';”和“ binmode STDOUT,':encoding(cp932)';”

Please help me out, finding a way to extract information from Excel sheets and getting into TSV format. 请帮帮我,找到一种从Excel工作表中提取信息并转换为TSV格式的方法。

PS : Excel allows direct save as TSV, but the output was screwed up there as well PS:Excel允许直接保存为TSV,但输出也在那里固定

I just exported your sample text perfectly from OpenOffice Calc, just by choosing the "Save as .csv" option and choosing UTF-8 as format. 我只是从OpenOffice Calc完美地导出了示例文本,只需选择“另存为.csv”选项并选择UTF-8作为格式即可。 I'd be very surprised if Excel can't do the same. 如果Excel无法做到这一点,我会感到非常惊讶。 Have you considered the possibility that VIM / your console doesn't support Chinese characters correctly or that it's set to use a font that doesn't include Chinese characters? 您是否考虑过VIM /您的控制台不正确支持汉字的可能性,或者它设置为使用不包含汉字的字体的可能性? To check for this kind of error, open your .csv or .tsv file in your web browser. 要检查这种错误,请在Web浏览器中打开.csv或.tsv文件。 Web browsers will do anything to correctly display a file, including changing fonts as necessary. Web浏览器将采取任何措施来正确显示文件,包括根据需要更改字体。

If you want, send me the file you need to export and I'll check if there's anything weird about it. 如果需要,请将您需要导出的文件发送给我,我会检查该文件是否存在任何异常。 Could be one of the native Chinese encodings (gb or big5) instead of Unicode. 可以是本地中文编码(gb或big5)之一,而不是Unicode。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM