[英]Java; Trying to convert a String which contains ISO-8859-1 encoding to UTF-8 but file is UTF-8
[英]Convert UTF-8 file to ISO-8859-1
资源:
C:\\temp\\test.csv
"Русслэнд";"Ελλάς";"Réunion"
预期结果:
C:\\temp\\test.properties
"\u0420\u0443\u0441\u0441\u043b\u044d\u043d\u0434";"\u0395\u03bb\u03bb\u03ac\u03c2";"R\u00e9unio"
当前结果:
C:\\temp\\test.properties
"????????", "?????","R궮ion"
码:
try {
File file = new File("C:\\temp\\test.csv");
FileInputStream is = new FileInputStream(file);
InputStreamReader r = new InputStreamReader(is, Charset.forName("UTF-8"));
FileOutputStream os = new FileOutputStream("C:\\temp\\test.properties");
OutputStreamWriter ow = new OutputStreamWriter(os, "ISO-8859-1");
char[] buffer = new char[1024];
int x;
while ((x = r.read(buffer)) == buffer.length) {
ow.write(buffer);
}
ow.write(buffer, 0, x);
ow.flush();
ow.close();
r.close();
} catch (IOException e) {
e.printStackTrace();
}
**
**
如何在Java 1.6中将大UTF-8 .csv文件转换为ISO-8859-1? 我想读取一个给定的文件,进行转换并保存。
private byte[] convertToISO(File file, Charset enc) {
// enc = Charset.forName("UTF-8");
try {
FileInputStream is = new FileInputStream(file);
InputStreamReader r = new InputStreamReader(is, enc);
char[] buffer = new char[1024];
StringWriter w = new StringWriter();
int x = 0;
while ((x = r.read(buffer)) == buffer.length) {
w.write(buffer);
}
w.write(buffer, 0, x);
w.flush();
String res = w.toString();
r.close();
return res.getBytes("ISO-8859-1");
} catch (IOException e) {
System.err.println("Failed to read file: " + file.getPath());
e.printStackTrace();
return null;
}
}
我假设您正在尝试将结果打印到控制台中。 默认情况下,任何jdk / JRE在控制台中打印任何内容时都将使用UTF-8。
要使用ISO-8859-1字符集,您可以在JVM参数中使用-Dfile.encoding=ISO-8859-1
。
或者,您可以如下所示配置IDE
您并不是要从UTF-8转换为ISO-8859-1,而是要将Unicode字符转义为ASCII流。 这不同于仅重新编码。
这是一个执行此功能的函数,它在写入输出流时会快速转义unicode字符:
public class OutputEscapingStreamWriter extends OutputStreamWriter {
public OutputEscapingStreamWriter(OutputStream out, Charset cs) {
super(out, cs);
}
public OutputEscapingStreamWriter(OutputStream out) {
super(out);
}
public OutputEscapingStreamWriter(OutputStream out, String cs) throws UnsupportedEncodingException {
super(out, cs);
}
public OutputEscapingStreamWriter(OutputStream out, CharsetEncoder cs) {
super(out, cs);
}
private static String HEX_DIGITS = "0123456789abcdef";
@Override
public void write(int c) throws IOException {
if (c < 128) {
super.write(c);
}
else {
super.write(toHexString(c));
}
}
@Override
public void write(String str, int off, int len) throws IOException {
for (int i = off; i < (off + len); i++) {
write(str.charAt(i));
}
}
@Override
public void write(char cbuf[], int off, int len) throws IOException {
for (int i = off; i < (off + len); i++) {
write(cbuf[i]);
}
}
private String toHexString(int c) {
StringBuilder sb = new StringBuilder("\\u");
sb.append(HEX_DIGITS.charAt((c & 0xF000) >> 12));
sb.append(HEX_DIGITS.charAt((c & 0x0F00) >> 8));
sb.append(HEX_DIGITS.charAt((c & 0x00F0) >> 4));
sb.append(HEX_DIGITS.charAt((c & 0x000F) ));
return sb.toString();
}
}
要在文件上使用它,只需打开FileOutputStream
并用OutputEscapingStreamWriter
包裹起来,如下所示:
OutputEscapingStreamWriter out = new OutputEscapingStreamWriter(new FileOutputStream("file.txt"));
一个快速而肮脏的单元测试,证明它产生了您期望的输出:
@Test
public void testConversion() throws Exception {
ByteArrayOutputStream output = new ByteArrayOutputStream();
OutputEscapingStreamWriter wrapper = new OutputEscapingStreamWriter(output);
wrapper.write("\"Русслэнд\";\"Ελλάς\";\"Réunion\"");
wrapper.flush();
wrapper.close();
String result = output.toString();
assertEquals("\"\\u0420\\u0443\\u0441\\u0441\\u043b\\u044d\\u043d\\u0434\";\"\\u0395\\u03bb\\u03bb\\u03ac\\u03c2\";\"R\\u00e9union\"",
result);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.