[英]How to detect if a file is not utf-8 encoded?
In Java, how can a file be tested that it's encoding is definitely not utf-8?在 Java 中,如何测试文件的编码绝对不是 utf-8?
I want to be able to validate if the contents are well-formed utf-8.我希望能够验证内容是否格式正确 utf-8。
Furthermore, also need to validate that the file does not start with the byte order mark (BOM).此外,还需要验证文件没有以字节顺序标记 (BOM) 开头。
If you just need to test the file, without actually retaining its contents:如果您只需要测试文件,而不实际保留其内容:
Path path = Paths.get("/home/dave/somefile.txt");
try (Reader reader = Files.newBufferedReader(path)) {
int c = reader.read();
if (c == 0xfeff) {
System.out.println("File starts with a byte order mark.");
} else if (c >= 0) {
reader.transferTo(Writer.nullWriter());
}
} catch (CharacterCodingException e) {
System.out.println("Not a UTF-8 file.");
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.