简体   繁体   English

Java替换字符串中的Unicode字符

[英]Java Replace unicode chars in string

I have a program that reads in a file. 我有一个读取文件的程序。 In this file there are some crazy chars that I have never seen before. 在此文件中,有一些我从未见过的疯狂字符。 The purpose of this file is to parse certain information into SQL statements. 该文件的目的是将某些信息解析为SQL语句。

When I get to this line in the file "read “Details for …(the name of the title”" (notice the horizontal ellipses and the right/left quotes), it outputs into this: 当我在文件“阅读……的详细信息……(标题名称”)(注意水平椭圆和右/左引号)中到达此行时,它输出为:

Details for (the name of the title 的细节(标题名称。

I just want to replace the chars that are right with chars defined by me. 我只想用我定义的字符替换正确的字符。 I have tried: 我努力了:

st = st.replaceAll("…","...");
st = st.replaceAll("\u2026","...");

This is how i read the file: 这就是我读取文件的方式:

 FileInputStream file = new FileInputStream(filePath);
 DataInputStream in = new DataInputStream(file); 
 BufferedReader br = new BufferedReader(new InputStreamReader(in));

And other things that I cant even remember. 还有我什至不记得的其他事情。 How can I do this seemingly simple task? 我该如何执行看似简单的任务?

You need specify the encoding on read the file before replaces specials chars... 在替换特殊字符之前,您需要在读取文件时指定编码。

FileInputStream inputStream = new FileInputStream("input.txt");
// Specify the enconding
InputStreamReader streamReader = new InputStreamReader(inputStream, "UTF-8");
BufferedReader in = new BufferedReader(streamReader);

Unless it's absolutely necessary you don't really have to drop those weird (yet still meaningful) characters... 除非绝对必要,否则您不必真的丢弃那些奇怪(但仍然有意义)的字符...

Have a look at the documentation for InputStreamReader and specify the right encoding when reading your file. 查看InputStreamReader的文档,并在读取文件时指定正确的编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM