[英]Java convert character stream into human “readable” String
I have a bunch of characters with that looks something like this: 我有一堆字符看起来像这样:
Комуникационна кабелна система
and sometimes I have a mix like this: 有时我会像这样混合:
Généralités
The first translates into : 第一个转换为:
Комуникационна кабелна система Комуникационнакабелнасистема
and the second to: 第二个:
Généralités Généralités
I can see this using a browser and place them into the body. 我可以使用浏览器看到这个并将它们放入正文中。
But how can I make java output the "real" characters ? 但是如何让java输出“真正的”字符呢? What is the above encoding called?
上面的编码叫什么?
I have tried a couple of things, and lastly this ( which did not work ): 我尝试了几件事,最后这件事(这不起作用):
import java.nio.charset.*;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
List<String> lst = new ArrayList<String>(); lst.add("К"); lst.add("о");
for ( String s : lst ) {
Charset utf8charset = Charset.forName("UTF-8");
Charset iso88591charset = Charset.forName("ISO-8859-1");
ByteBuffer inputBuffer = ByteBuffer.wrap( s.getBytes() );
// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);
// encode ISO-8559-1
ByteBuffer outputBuffer = iso88591charset.encode(data);
byte[] outputData = outputBuffer.array();
System.out.println ( new String(outputData) )
}
You can use commons-lang to unescape this sort of thing. 你可以使用commons-lang来解决这类问题。 In Groovy:
在Groovy中:
@Grab( 'commons-lang:commons-lang:2.6' )
import org.apache.commons.lang.StringEscapeUtils as SEU
def str = 'Généralités'
println SEU.unescapeHtml( str )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.