简体   繁体   English

将 InputStream 从 ISO-8859-1 转换为 UTF-8

[英]Convert InputStream from ISO-8859-1 to UTF-8

I have a file in ISO-8859-1 containing german umlauts and I need to unmarshall it using JAXB.我在 ISO-8859-1 中有一个包含德语变音符号的文件,我需要使用 JAXB 对其进行解组。 But before I need the content in UTF-8.但在我需要UTF-8中的内容之前。

@Override
public List<Usage> convert(InputStream input) {
    try {
        InputStream inputWithNamespace = addNamespaceIfMissing(input);
        inputWithNamespace = convertFileToUtf(inputWithNamespace);
        ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);
        ...

I get the "file" as an InputStream.我将“文件”作为 InputStream 获取。 My idea was to read the file's content in UTF-8 and make another InputStream to use.我的想法是读取 UTF-8 中的文件内容并制作另一个 InputStream 来使用。 This is what I've tried:这是我尝试过的:

private InputStream convertFileToUtf(InputStream inputStream) throws IOException {
    byte[] bytesInIso = ByteStreams.toByteArray(inputStream);
    String stringIso = new String(bytesInIso);
    byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
    String stringUtf = new String(bytesInUtf);
    return new ByteArrayInputStream(bytesInUtf);
}

I have those 2 Strings to check the contents, but even just reading the ISO file, it gives question marks where umlauts are (?) and converting that to UTF_8 gives strange characters like 1/2 and so on.我有这 2 个字符串来检查内容,但即使只是读取 ISO 文件,它也会在变音符号所在的位置(?)给出问号,并将其转换为 UTF_8 会给出奇怪的字符,如 1/2 等。

UPDATE更新

byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
String contentInIso = new String(bytesInIso);

byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String contentInUtf = new String(bytesInUtf);  

Verifying contentInIso prints question marks instead of the umlauts and by checking contentInIso instead of umlauts, it has characters like "�".验证 contentInIso 打印问号而不是变音符号,并且通过检查 contentInIso 而不是变音符号,它有像“�”这样的字符。

@Override
    public List<Usage> convert(InputStream input) {
        try {
            InputStream inputWithNamespace = addNamespaceIfMissing(input);

            byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
            String contentInIso = new String(bytesInIso);

            byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
            String contentInUtf = new String(bytesInUtf);

            ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);

This method convert it's called by another one called processUsageFile:此方法转换它由另一个名为 processUsageFile 的方法调用:

private void processUsageFile(File usageFile) {
        try (FileInputStream fileInputStream = new FileInputStream(usageFile)) {
            usageImporterService.importUsages(usageFile.getName(), fileInputStream, getUsageTypeValidated(usageFile.getName()));
            log.info("Usage file {} imported successfully. Moving to archive directory", usageFile.getName());

If i take the code I have written under the UPDATE statement and put it immediately after the try, the first contentInIso has question marks but the contentInUtf has the umlauts.如果我在 UPDATE 语句下编写代码并在尝试后立即将其放入,则第一个 contentInIso 有问号,但 contentInUtf 有变音符号。 Then, by going into the convert, jabx throws an exception that the file has a premature end of line.然后,通过进入转换,jabx 会抛出一个异常,即文件有一个过早的行尾。

Regarding the behaviour you are getting,关于你得到的行为,

String stringIso = new String(bytesInIso);

In this step, you construct a new String by decoding the specified array of bytes using the platform's default charset .在这一步中,您通过使用平台的默认字符集解码指定的字节数组来构造一个新的字符串。

Since this is probably not ISO_8859_1, I think the String you are looking at becomes garbled here.由于这可能不是 ISO_8859_1,我认为您正在查看的字符串在这里变得乱码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM