简体   繁体   English

Excel电子表格中的字符编码(以及用于解码它的Java字符集)

[英]Character encoding in Excel spreadsheet (and what Java charset to use to decode it)

I am using the JExcel library to read excel spreadsheets. 我正在使用JExcel库来读取excel电子表格。 Each cell on the spreadsheet may contain localization strings in any of something like 44 languages (English, Portugese, French, Chinese, etc). 电子表格中的每个单元格都可以包含44种语言(英语,葡萄牙语,法语,中文等)中的任何一种本地化字符串。 Today I don't tell the API anything regarding the encoding its supposed to use. 今天我不告诉API有关它应该使用的编码的任何信息。 Its handling the Chinese OK, but it always screws up Portugese and German. 它处理中国好,但它总是搞乱葡萄牙和德国。 Somehow the default encoding (MacRoman on my dev box, UTF-8 on production) is failing to properly interpret the strings it pulls out of the excel workbook. 不知何故,默认编码(我的开发盒上的MacRoman,生产中的UTF-8)无法正确解释它从excel工作簿中拉出的字符串。 There has to be something wrong with how JExcel is interpreting the character encoding of the file. JExcel如何解释文件的字符编码有一些问题。

That being said... 话虽如此...

Are all the strings in an excel workbook encoded with the same character set? excel工作簿中的所有字符串是否都使用相同的字符集进行编码?

Is there workbook meta-data I can ask what this character set is (I haven't found it yet)? 是否有工作簿元数据我可以问这个字符集是什么(我还没有找到)?

If I run all the cells through something like jchardet (http://jchardet.sourceforge.net/), is it likely to be able to divine the character encoding for the whole workbook (this is pretty much predicated on the first question being "yes, all stings in a given workbook are encoded with the same character set")? 如果我通过像jchardet(http://jchardet.sourceforge.net/)之类的东西运行所有单元格,是否可能能够为整个工作簿划分字符编码(这几乎是基于第一个问题“是的,给定工作簿中的所有stings都使用相同的字符集编码“)?

So many questions, so little time. 这么多问题,时间太少了。

Well I didn't get an answer directly, but Matt's discovery of a spec points the way towards an actual answer: http://sc.openoffice.org/excelfileformat.pdf 好吧,我没有直接得到答案,但马特发现的一个规范指出了实际答案: http//sc.openoffice.org/excelfileformat.pdf

In the mean time, my problem went away by just setting the encoding to always be "Cp1252". 与此同时,只需将编码设置为“Cp1252”,我的问题就消失了。 I'm not sure exactly why, but I'm not looking a gift horse in the mouth, so to speak, and am moving on. 我不确定为什么,但我不是在寻找一匹礼物马,可以这么说,并且继续前进。

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

I'll call this one answered. 我会称这个回答。

I have the problem that, while reading cell values from the excel file, some values appeared with "?" 我有一个问题,当从excel文件中读取单元格值时,某些值出现“?” as this corresponds to letters with accent... Would that code resolve this issue ?. 因为这对应于带重音的字母......那个代码会解决这个问题吗? Because as I am running under windows, I cannot test as fast as If I would be under Linux (which is the SO of the server where I'm deploying to)... 因为当我在Windows下运行时,我无法像在Linux下那样快速测试(这是我正在部署的服务器的SO)...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM