简体   繁体   English

在Java Web App中读取Excel文件时出现字符编码问题

[英]Character encoding issue while reading an Excel file in a Java Web App

In a Java Web app, I'm using the JExcel API to read Excel files sent by clients. 在Java Web应用程序中,我正在使用JExcel API读取客户端发送的Excel文件。

I'm doing something like this : 我正在做这样的事情:

byte[] excelFile = ...
InputStream inputStream = new ByteArrayInputStream(excelFile);

WorkbookSettings ws = new WorkbookSettings();
ws.setEncoding("CP1252");

Workbook w = Workbook.getWorkbook(inputStream, ws);
...

Struts gives me the Excel file as a byte array (I use the FormFile#getFileData() method). Struts将Excel文件作为字节数组提供给我(我使用FormFile#getFileData()方法)。

It works OK on Windows. 在Windows上可以正常工作。 However this is quite different on Linux. 但是,这在Linux上是完全不同的。 While cells can be parsed correctly and their content well interpreted (even if there is some non ASCII characters like 'à', 'ê', etc), sheet names does not. 虽然可以正确解析单元格并正确解释其内容(即使存在诸如“à”,“ê”等非ASCII字符),但工作表名称却不能。 I get some bad characters like '?' 我收到一些不好的字符,例如“?” or ' '. 或``。

I forced workbook encoding to UTF-8 : 我将工作簿编码强制为UTF-8:

ws.setEncoding("UTF-8");

but there is no effect. 但没有效果。

I changed the Excel file to UTF-8 too, nothing happens. 我也将Excel文件也更改为UTF-8,没有任何反应。 I really don't understand why it does not work, especially sheet names, since the whole chain is in UTF-8 (I have a Servlet Filter which forces HTTP requests encoding to UTF-8 too). 我真的不明白为什么它不起作用,尤其是工作表名称,因为整个链都在UTF-8中(我有一个Servlet过滤器,它也强制将HTTP请求编码为UTF-8)。

I had a similar problem but with another java excel api. 我有一个类似的问题,但使用另一个java excel api。 The problem is that excel tries to be smart and replace some characters for you. 问题是excel会尝试变得聪明,并为您替换一些字符。 An example of this in my case would be that excel replaced three dots '...' with a singe character representing three dots out of it's own character set which is non-standard UTF-8. 在我的案例中,这方面的一个示例是excel用代表其自身字符集(非标准UTF-8)中的三个点的单一字符替换了三个点“ ...”。 My framework didn't recognize it and I got similar undefined character ( ') as you are now getting. 我的框架无法识别它,并且您现在得到类似的未定义字符( ')。 To fix this I had to manually edit all the excel spreadsheets and then it worked ok. 要解决此问题,我必须手动编辑所有Excel电子表格,然后一切正常。 The big problem I had was finding which characters it was. 我遇到的最大问题是找到它是哪个字符。 I am not sure if this is an option for you though. 我不确定这是否适合您。

It seems to be a bug of the JXL version I am using. 这似乎是我使用的JXL版本的错误。 Indeed, if I upgrade the JAR to the last version, the problem does not occur. 的确,如果我将JAR升级到最新版本,则不会发生此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM