[英]how to detect wrong encoding
This program write on 2 files. 该程序写入2个文件。
In the right file the string is "IL RITROVO AL 1° PIANO" 在正确的文件中,字符串为“ IL RITROVO AL 1°PIANO”
In the wrong file the string is "IL RITROVO AL 1NUL PIANO". 在错误的文件中,字符串为“ IL RITROVO AL 1NUL PIANO”。
In the second case, the "°" charater has wrong econding; 在第二种情况下,“°”字符有错误的解释; how can I detect this case before I write it?
在写之前如何检测到这种情况?
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
public class WrongWriter {
static File wrongFile = new File("C:/Users/utente/Desktop/wrongFile.txt");
static File rightFile = new File("C:/Users/utente/Desktop/rightFile.txt");
public static void main(String[] args) throws IOException {
byte[] wrongBytes = new byte[]{
73, 76, 32, 82, 73, 84, 82, 79, 86, 79, 32, 65, 76, 32, 49, 0, 32, 80, 73, 65, 78, 79, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
};
write(wrongFile, wrongBytes) ;
byte[] rightBytes = "IL RITROVO AL 1° PIANO".getBytes();
write(rightFile, rightBytes) ;
}
static void write(File file, byte[] bytes) throws IOException{
OutputStreamWriter stream = null; //10227
stream = new OutputStreamWriter( new FileOutputStream( file ) , "ISO-8859-15");
stream.write( new String( bytes ) );
stream.flush();
stream.close();
}
}
String/char/Writer/Reader
are Unicode text in java. String/char/Writer/Reader
是Java中的Unicode文本。 (This makes java unique among other languages.) Java text can always contain any mix of scripts. (这使得Java在其他语言中是唯一的。)Java文本始终可以包含脚本的任何混合。
byte[]/InputStream/OutputStream
are binary data in Java. byte[]/InputStream/OutputStream
是Java中的二进制数据。 To be interpreted as text they must be given their encoding. 要被解释为文本,必须为其提供编码。
So you can do: 因此,您可以执行以下操作:
OutputStreamWriter stream = null; //10227
stream = new OutputStreamWriter( new FileOutputStream(file), "ISO-8859-15");
stream.print("IL RITROVO AL 1° PIANO");
stream.close();
The class OuputStreamWriter bridges this and writes the Unicode text into bytes having that enocoding. OuputStreamWriter类将其桥接,并将Unicode文本写入具有该编码的字节中。
In general the conversions are: 通常,转换为:
bytes[] inISO15 = "IL RITROVO AL 1° PIANO".getBytes("ISO-8859-15");
String s = new String(inISO15, "ISO-8859-15");
You took the OutputStream functionality in writing bytes, bypassing the conversion. 您采用了OutputStream功能来写入字节,从而绕过了转换。 This the should be done as:
这应该通过以下方式完成:
stream.write(inISO5015);
But then better not use a Writer, but maybe immediately the FileOutputStream or a BufferedOutputStream. 但是最好不要使用Writer,而应立即使用FileOutputStream或BufferedOutputStream。
Thanks, but this is want I wanted, but I'm not be able to do with CharsetDecoder.... 谢谢,但这是我想要的,但是我无法使用CharsetDecoder。
package dummy;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
public class WrongWriter {
static File wrongFile = new File("C:/Users/utente/Desktop/wrongFile.txt");
static File rightFile = new File("C:/Users/utente/Desktop/rightFile.txt");
public static void main(String[] args) throws IOException {
byte[] wrongBytes = new byte[]{
73, 76, 32, 82, 73, 84, 82, 79, 86, 79, 32, 65, 76, 32, 49, 0, 32, 80, 73, 65, 78, 79, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32
};
if (CharacterChecker.isISO_8859_1(wrongBytes)) {
write(wrongFile, wrongBytes) ;
} else{
System.out.println("Bad input");
}
byte[] rightBytes = "IL RITROVO AL 1° PIANO".getBytes("ISO-8859-15");
write(rightFile, rightBytes) ;
}
static void write(File file, byte[] bytes) throws IOException{
OutputStreamWriter stream = null; //10227
stream = new OutputStreamWriter( new FileOutputStream( file ) , "ISO-8859-15");
stream.write( new String( bytes, "ISO-8859-15" ) );
stream.flush();
stream.close();
}
}
class CharacterChecker {
static public boolean isISO_8859_1(byte[] bytes) throws UnsupportedEncodingException{
for(int i=0;i< bytes.length;i++)
{
if(
( bytes[i]<32 && bytes[i] >=0)
|| (bytes[i]<-65 && bytes[i]>-69)
|| bytes[i]==-72
|| bytes[i]==-76
|| bytes[i]==-88
|| bytes[i]==-90
|| bytes[i]==-92
) {
return false;
}
}
return true;
}
static public boolean isISO_8859_1(String s) throws UnsupportedEncodingException{
byte[] bytes = s.getBytes("ISO-8859-1");
return isISO_8859_1(bytes);
}
static public String replaceNotISO_8859_1_characters(String s, char chracter) throws UnsupportedEncodingException{
String cString = Character.toString(chracter);
byte sobs = cString.getBytes("ISO-8859-1")[0];
byte[] bytes = s.getBytes("ISO-8859-1");
for(int i=0;i< bytes.length;i++)
{
if(
( bytes[i]<32 && bytes[i] >=0)
|| (bytes[i]<-65 && bytes[i]>-69)
|| bytes[i]==-72
|| bytes[i]==-76
|| bytes[i]==-88
|| bytes[i]==-90
|| bytes[i]==-92
) {
bytes[i] = sobs;
}
}
return new String(bytes,"ISO-8859-1");
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.