繁体   English   中英

java:如何将文件转换为utf8

[英]java: how to convert a file to utf8

我有一个文件有一些非utf8 caracters(如“ISO-8859-1”),所以我想将该文件(或读取)转换为UTF8编码,我该怎么做?

它是这样的代码:

File file = new File("some_file_with_non_utf8_characters.txt");

/* some code to convert the file to an utf8 file */

...

编辑:放一个编码示例

以下代码将文件从srcEncoding转换为tgtEncoding:

public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    BufferedReader br = null;
    BufferedWriter bw = null;
    try{
        br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding));
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding));
        char[] buffer = new char[16384];
        int read;
        while ((read = br.read(buffer)) != -1)
            bw.write(buffer, 0, read);
    } finally {
        try {
            if (br != null)
                br.close();
        } finally {
            if (bw != null)
                bw.close();
        }
    }
}

- 编辑 -

使用Try-with-resources(Java 7):

public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException {
    try (
      BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding));
      BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) {
          char[] buffer = new char[16384];
          int read;
          while ((read = br.read(buffer)) != -1)
              bw.write(buffer, 0, read);
    } 
}
  String charset = "ISO-8859-1"; // or what corresponds
  BufferedReader in = new BufferedReader( 
      new InputStreamReader (new FileInputStream(file), charset));
  String line;
  while( (line = in.readLine()) != null) { 
    ....
  }

你有文字解码。 您可以通过simmetric Writer / OutputStream方法使用您喜欢的编码(例如UTF-8)来编写它。

您需要知道输入文件的编码。 例如,如果文件是Latin-1,你会做这样的事情,

        FileInputStream fis = new FileInputStream("test.in");
        InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1");
        Reader in = new BufferedReader(isr);
        FileOutputStream fos = new FileOutputStream("test.out");
        OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
        Writer out = new BufferedWriter(osw);

        int ch;
        while ((ch = in.read()) > -1) {
            out.write(ch);
        }

        out.close();
        in.close();

你只想把它读作UTF-8? 我最近给出类似问题的是用-Dfile.encoding = UTF-8启动JVM,并正常读取/打印。 我不知道这是否适用于您的情况。

有了这个选项:

System.out.println("á é í ó ú")

正确打印字符。 否则打印出来? 符号

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM