简体   繁体   English

Java CP1252至UTF8

[英]Java CP1252 to UTF8

I have a spreadsheet (.xls) with car plate numbers in encoding windows-1252, BUT originally those numbers were inputted in cyrillic in encoding UTF-8. 我有一个电子表格(.xls),其中的车牌号编码为Windows-1252,但最初这些数字是以西里尔字母输入的,编码为UTF-8。 What I mean: ie У992НВ in cyrillic is the same Y992HB in latin (there is a difference between first letters) So, I take those numbers and convert it 我的意思是:即西里尔文的У992НВ与拉丁文的Y992HB是相同的(首字母之间存在差异)所以,我将这些数字转换为

 if (cell.getCellType() == CellType.STRING) {
                    String cellValue = cell.getStringCellValue();
                    try {
                        byte[] b = cellValue.getBytes("windows-1252");
                        String cellValue2 = new String(b, StandardCharsets.UTF_8);
                        cell.setCellValue(cellValue2);
                    }
                    catch ( UnsupportedEncodingException ex) {

                    }

But, output is wrong. 但是,输出是错误的。 Input data in windows-1252 is " Т313ÐК777 " and output is Т313 К777, because middle sign is unreadable. Windows-1252中的输入数据为“ Т313ÐК777 ”,而输出为Т313.К777,因为中间的符号不可读。 What am I doing wrong? 我究竟做错了什么?

  1. Java's byte is not a byte. Java的字节不是字节。 So byte by byte decoding didn't work. 因此,逐字节解码不起作用。
  2. I parsed symbols dex values and tried to decode them by matching values with UTF8. 我解析了符号dex值,并尝试通过将值与UTF8匹配来对其进行解码。 Some values were equivalent only to UTF-8 latin-1. 一些值仅等效于UTF-8 latin-1。 I found package for python to decode broken UTF-8. 我找到了用于python的软件包,用于解码损坏的UTF-8。 It works. 有用。 BUT: It doesn't work with jython 2.7, because maintainer gave up supporting Python 2.7 但是:它不适用于jython 2.7,因为维护者放弃了对Python 2.7的支持

Thanks for your help. 谢谢你的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM