JAVA，如何控制 UTF8 文件中的 BOM？

Question

我使用Java来读取文件。 这是我的代码：

      public static String[] fajlbeolvasa(String s) throws IOException
      {
        ArrayList<String> list = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(s), "UTF8"));

        while(true)
        {
        String line = reader.readLine();
        if (line == null)
        {
            break;
        }
        list.add(line);
        }
      }

但是，当我读取文件时，输出的形状将不正确。 例如： "Farkasgyep\\305\\261" 。 可能 BOM 有问题。 如何在 Java 中解决这个问题？ 将不胜感激任何帮助。

Answer 1

您可以尝试通过以下方式检查 BOM，这会将文件视为byte[] ，将它与您的文件一起使用应该没有问题：

private static boolean isBOMPresent(byte[] content){
    boolean result = false;

    byte[] bom = new byte[3];
    try (ByteArrayInputStream is = new ByteArrayInputStream(content)) {
        int bytesReaded = is.read(bom);

        if(bytesReaded != -1) {
            String stringContent = new String(Hex.encodeHex(bom));
            if (BOM_HEX_ENCODE.equalsIgnoreCase(stringContent)) {
                result = true;
            }
        }
    } catch (Exception e) {
        LOGGER.error(e);
    }

    return result;
}

然后，如果你需要删除它，你可以使用这个：

public static byte[] removeBOM(byte[] fileWithBOM) {
    final String BOM_HEX_ENCODE = "efbbbf";
    
    if (isBOMPresent(fileWithBOM)) {
        ByteBuffer bb = ByteBuffer.wrap(fileWithBOM);

        byte[] bom = new byte[3];
        bb.get(bom, 0, bom.length);

        byte[] contentAfterFirst3Bytes = new byte[fileWithBOM.length - 3];
        bb.get(contentAfterFirst3Bytes, 0, contentAfterFirst3Bytes.length);

        return contentAfterFirst3Bytes;
    } else {
        return fileWithBOM;
    }

}

JAVA，如何控制 UTF8 文件中的 BOM？

问题描述

1 个解决方案

解决方案1
0 2021-07-07 09:44:15

JAVA，如何控制 UTF8 文件中的 BOM？

问题描述

1 个解决方案

解决方案1 0 2021-07-07 09:44:15

解决方案1
0 2021-07-07 09:44:15