Java CP1252至UTF8

Question

I have a spreadsheet (.xls) with car plate numbers in encoding windows-1252, BUT originally those numbers were inputted in cyrillic in encoding UTF-8. 我有一个电子表格（.xls），其中的车牌号编码为Windows-1252，但最初这些数字是以西里尔字母输入的，编码为UTF-8。 What I mean: ie У992НВ in cyrillic is the same Y992HB in latin (there is a difference between first letters) So, I take those numbers and convert it 我的意思是：即西里尔文的У992НВ与拉丁文的Y992HB是相同的（首字母之间存在差异）所以，我将这些数字转换为

 if (cell.getCellType() == CellType.STRING) {
                    String cellValue = cell.getStringCellValue();
                    try {
                        byte[] b = cellValue.getBytes("windows-1252");
                        String cellValue2 = new String(b, StandardCharsets.UTF_8);
                        cell.setCellValue(cellValue2);
                    }
                    catch ( UnsupportedEncodingException ex) {

                    }

But, output is wrong. 但是，输出是错误的。 Input data in windows-1252 is " Ð¢313ÐÐš777 " and output is Т313 К777, because middle sign is unreadable. Windows-1252中的输入数据为“ Ð¢313ÐÐš777 ”，而输出为Т313.К777，因为中间的符号不可读。 What am I doing wrong? 我究竟做错了什么？

Answer 1

Java's byte is not a byte. Java的字节不是字节。 So byte by byte decoding didn't work. 因此，逐字节解码不起作用。
I parsed symbols dex values and tried to decode them by matching values with UTF8. 我解析了符号dex值，并尝试通过将值与UTF8匹配来对其进行解码。 Some values were equivalent only to UTF-8 latin-1. 一些值仅等效于UTF-8 latin-1。 I found package for python to decode broken UTF-8. 我找到了用于python的软件包，用于解码损坏的UTF-8。 It works. 有用。 BUT: It doesn't work with jython 2.7, because maintainer gave up supporting Python 2.7 但是：它不适用于jython 2.7，因为维护者放弃了对Python 2.7的支持

Thanks for your help. 谢谢你的帮助。

Java CP1252至UTF8

问题描述

1 个解决方案

解决方案1
0 2018-10-19 09:54:58

Java CP1252至UTF8

问题描述

1 个解决方案

解决方案1 0 2018-10-19 09:54:58

解决方案1
0 2018-10-19 09:54:58