简体   繁体   English

如何检测文件是否不是 utf-8 编码?

[英]How to detect if a file is not utf-8 encoded?

In Java, how can a file be tested that it's encoding is definitely not utf-8?在 Java 中,如何测试文件的编码绝对不是 utf-8?

I want to be able to validate if the contents are well-formed utf-8.我希望能够验证内容是否格式正确 utf-8。

Furthermore, also need to validate that the file does not start with the byte order mark (BOM).此外,还需要验证文件没有以字节顺序标记 (BOM) 开头。

If you just need to test the file, without actually retaining its contents:如果您只需要测试文件,而不实际保留其内容:

Path path = Paths.get("/home/dave/somefile.txt");
try (Reader reader = Files.newBufferedReader(path)) {
    int c = reader.read();
    if (c == 0xfeff) {
        System.out.println("File starts with a byte order mark.");
    } else if (c >= 0) {
        reader.transferTo(Writer.nullWriter());
    }
} catch (CharacterCodingException e) {
    System.out.println("Not a UTF-8 file.");
}
  • Files.newBufferedReader always uses UTF-8 if no charset is provided.如果没有提供字符集, Files.newBufferedReader 总是使用 UTF-8。
  • 0xfeff is the byte order mark codepoint. 0xfeff 是字节顺序标记代码点。
  • reader.transferTo(Writer.nullWriter()) (available as of Java 11) processes the file and immediately discards it. reader.transferTo(Writer.nullWriter())(从 Java 11 开始可用)处理文件并立即丢弃它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何生成.txt文件为UTF-8编码? - How to generate .txt file as a UTF-8 encoded? 如何将自定义编码文件转换为UTF-8(使用Java或使用专用工具) - How to convert custom encoded file to UTF-8 (in Java or with a dedicated tool) 如何编译编码为“UTF-8”的java源文件? - How to compile a java source file which is encoded as “UTF-8”? 如何在Java中创建utf-8编码的文件,以便在notepad ++ / notepad或任何其他文本编辑器中打开时显示为UTF-8编码 - How to create a utf-8 encoded file in java such that it shows as UTF-8 encoded when opened in notepad++/notepad or any other text editor 如何在JSoup中打印UTF-8编码的字符 - How to print UTF-8 encoded charecters in JSoup java检测文件是否为UTF-8或Ansi - java detect if file is UTF-8 or Ansi 使用Java BOM发送以UTF-8编码的CSV文件 - Send CSV file encoded in UTF-8 with BOM in Java Java Spring返回以BOM表以UTF-8编码的CSV文件 - Java Spring returning CSV file encoded in UTF-8 with BOM 来自Java UTF-8编码的unmarshalled文件中的错误char - Wrong char from Java unmarshalled file encoded in UTF-8 如何使用java将ucs2编码的文件转换为UTF-8或UTF-16或ANSI编码格式 - How to convert ucs2 encoded file to either UTF-8 or UTF-16 or ANSI encoding format using java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM