繁体   English   中英

如何检测错误的编码

[英]how to detect wrong encoding

该程序写入2个文件。
在正确的文件中,字符串为“ IL RITROVO AL 1°PIANO”
在错误的文件中,字符串为“ IL RITROVO AL 1NUL PIANO”。
在第二种情况下,“°”字符有错误的解释; 在写之前如何检测到这种情况?

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter; 

public class WrongWriter {
    static File wrongFile = new File("C:/Users/utente/Desktop/wrongFile.txt");
    static File rightFile = new File("C:/Users/utente/Desktop/rightFile.txt");


    public static void main(String[] args) throws IOException {

        byte[] wrongBytes = new byte[]{
                73, 76, 32, 82, 73, 84, 82, 79, 86, 79, 32, 65, 76, 32, 49, 0, 32, 80, 73, 65, 78, 79, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
                };


        write(wrongFile, wrongBytes) ;

        byte[] rightBytes = "IL RITROVO AL 1° PIANO".getBytes();

        write(rightFile, rightBytes) ;
    }



    static void write(File file, byte[] bytes) throws IOException{
        OutputStreamWriter stream = null; //10227
        stream =  new OutputStreamWriter( new FileOutputStream( file )  , "ISO-8859-15"); 
        stream.write( new String(  bytes ) ); 
        stream.flush();
        stream.close();

    }

}

String/char/Writer/Reader是Java中的Unicode文本。 (这使得Java在其他语言中是唯一的。)Java文本始终可以包含脚本的任何混合。

byte[]/InputStream/OutputStream是Java中的二进制数据。 要被解释为文本,必须为其提供编码。

因此,您可以执行以下操作:

OutputStreamWriter stream = null; //10227
stream =  new OutputStreamWriter( new FileOutputStream(file), "ISO-8859-15"); 
stream.print("IL RITROVO AL 1° PIANO"); 
stream.close();

OuputStreamWriter类将其桥接,并将Unicode文本写入具有该编码的字节中。

通常,转换为:

bytes[] inISO15 = "IL RITROVO AL 1° PIANO".getBytes("ISO-8859-15");
String s = new String(inISO15, "ISO-8859-15");

您采用了OutputStream功能来写入字节,从而绕过了转换。 这应该通过以下方式完成:

stream.write(inISO5015);

但是最好不要使用Writer,而应立即使用FileOutputStream或BufferedOutputStream。

谢谢,但这是我想要的,但是我无法使用CharsetDecoder。

package dummy;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter; 
import java.io.UnsupportedEncodingException;

public class WrongWriter {
    static File wrongFile = new File("C:/Users/utente/Desktop/wrongFile.txt");
    static File rightFile = new File("C:/Users/utente/Desktop/rightFile.txt");


    public static void main(String[] args) throws IOException {

        byte[] wrongBytes = new byte[]{
                73, 76, 32, 82, 73, 84, 82, 79, 86, 79, 32, 65, 76, 32, 49, 0, 32, 80, 73, 65, 78, 79, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
                };

        if (CharacterChecker.isISO_8859_1(wrongBytes)) {
            write(wrongFile, wrongBytes) ;          
        } else{
            System.out.println("Bad input");
        }

        byte[] rightBytes = "IL RITROVO AL 1° PIANO".getBytes("ISO-8859-15");

        write(rightFile, rightBytes) ;
    }



    static void write(File file, byte[] bytes) throws IOException{
        OutputStreamWriter stream = null; //10227
        stream =  new OutputStreamWriter( new FileOutputStream( file )  , "ISO-8859-15"); 
        stream.write( new String(  bytes,  "ISO-8859-15" ) ); 
        stream.flush();
        stream.close();

    }

}
class CharacterChecker {


    static public boolean isISO_8859_1(byte[] bytes) throws UnsupportedEncodingException{ 

        for(int i=0;i< bytes.length;i++)
           {
               if( 
                      ( bytes[i]<32 && bytes[i] >=0) 
                      || (bytes[i]<-65 && bytes[i]>-69)
                      || bytes[i]==-72
                      || bytes[i]==-76 
                      || bytes[i]==-88 
                      || bytes[i]==-90 
                      || bytes[i]==-92
                      ) {
                   return false; 
               }

           }

        return true;
    }

    static public boolean isISO_8859_1(String s) throws UnsupportedEncodingException{
        byte[] bytes = s.getBytes("ISO-8859-1"); 

        return isISO_8859_1(bytes);
    }


    static public String replaceNotISO_8859_1_characters(String s, char chracter) throws UnsupportedEncodingException{
        String cString = Character.toString(chracter);
        byte sobs = cString.getBytes("ISO-8859-1")[0];

        byte[] bytes = s.getBytes("ISO-8859-1");

        for(int i=0;i< bytes.length;i++)
           {
               if( 
                      ( bytes[i]<32 && bytes[i] >=0) 
                      || (bytes[i]<-65 && bytes[i]>-69)
                      || bytes[i]==-72
                      || bytes[i]==-76 
                      || bytes[i]==-88 
                      || bytes[i]==-90 
                      || bytes[i]==-92
                      ) {
                   bytes[i] = sobs;
               }

           }

        return new String(bytes,"ISO-8859-1");
    }


}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM