I use Java for file reading. Here's my code:
public static String[] fajlbeolvasa(String s) throws IOException
{
ArrayList<String> list = new ArrayList<>();
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(s), "UTF8"));
while(true)
{
String line = reader.readLine();
if (line == null)
{
break;
}
list.add(line);
}
}
However, when I read the file, then the output will be incorrect shaped. For example: "Farkasgyep\\305\\261"
. Maybe something wrong with the BOM. How can I solve this problem in Java? Will be grateful for any help.
You can try to check for BOM in the following way, this treat the file as byte[]
, you shouldn't have problem using this with your file:
private static boolean isBOMPresent(byte[] content){
boolean result = false;
byte[] bom = new byte[3];
try (ByteArrayInputStream is = new ByteArrayInputStream(content)) {
int bytesReaded = is.read(bom);
if(bytesReaded != -1) {
String stringContent = new String(Hex.encodeHex(bom));
if (BOM_HEX_ENCODE.equalsIgnoreCase(stringContent)) {
result = true;
}
}
} catch (Exception e) {
LOGGER.error(e);
}
return result;
}
Then, if you need to remove it you can use this:
public static byte[] removeBOM(byte[] fileWithBOM) {
final String BOM_HEX_ENCODE = "efbbbf";
if (isBOMPresent(fileWithBOM)) {
ByteBuffer bb = ByteBuffer.wrap(fileWithBOM);
byte[] bom = new byte[3];
bb.get(bom, 0, bom.length);
byte[] contentAfterFirst3Bytes = new byte[fileWithBOM.length - 3];
bb.get(contentAfterFirst3Bytes, 0, contentAfterFirst3Bytes.length);
return contentAfterFirst3Bytes;
} else {
return fileWithBOM;
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.