[英]Convert Latin-1 content of InputStream into UTF-8 String
I need to convert the content of an InputStream into a String. 我需要将InputStream的内容转换为String。 The difficulty here is the input encoding, namely Latin-1.
这里的难点是输入编码,即Latin-1。 I tried several approaches and code snippets with String, getBytes, char[], etc. in order to get the encoding straight, but nothing seemed to work.
我尝试了几种方法和代码片段,包括String,getBytes,char []等,以便直接获得编码,但似乎没有任何效果。
Finally, I came up with the working solution below. 最后,我想出了下面的工作解决方案。 However, this code seems a little verbose to me, even for Java.
但是,这个代码对我来说似乎有点冗长,即使对于Java也是如此。 So the question here is:
所以这里的问题是:
Is there a simpler and more elegant approach to achieve what is done here? 有没有更简单,更优雅的方法来实现这里所做的事情?
private String convertStreamToStringLatin1(java.io.InputStream is)
throws IOException {
String text = "";
// setup readers with Latin-1 (ISO 8859-1) encoding
BufferedReader i = new BufferedReader(new InputStreamReader(is, "8859_1"));
int numBytes;
CharBuffer buf = CharBuffer.allocate(512);
while ((numBytes = i.read(buf)) != -1) {
text += String.copyValueOf(buf.array(), 0, numBytes);
buf.clear();
}
return text;
}
Firstly, a few criticisms of the approach you've taken already. 首先,对你已采取的方法提出一些批评。 You shouldn't unnecessarily use an NIO
CharBuffer
when you merely want a char[512]
. 当你只想要一个
char[512]
时,你不应该不必要地使用NIO CharBuffer
。 You don't need to clear
the buffer each iteration, either. 您也不需要每次迭代都
clear
缓冲区。
int numBytes;
final char[] buf = new char[512];
while ((numBytes = i.read(buf)) != -1) {
text += String.copyValueOf(buf, 0, numBytes);
}
You should also know that just constructing a String
with those arguments will have the same effect, as the constructor too copies the data. 您还应该知道,只使用这些参数构造
String
将具有相同的效果,因为构造函数也会复制数据。
The contents of the subarray are copied;
子阵列的内容被复制; subsequent modification of the character array does not affect the newly created string.
后续修改字符数组不会影响新创建的字符串。
You can use a dynamic ByteArrayOutputStream
which grows an internal buffer to accommodate all the data. 您可以使用动态
ByteArrayOutputStream
来增加内部缓冲区以容纳所有数据。 You can then use the entire byte[]
from toByteArray
to decode into a String
. 然后,您可以使用
toByteArray
的整个byte[]
解码为String
。
The advantage is that deferring decoding until the end avoids decoding fragments individually; 优点是推迟解码直到最后避免单独解码片段; while that may work for simple charsets like ASCII or ISO-8859-1, it will not work on multi-byte schemes like UTF-8 and UTF-16.
虽然这可能适用于简单的字符集,如ASCII或ISO-8859-1,但它不适用于UTF-8和UTF-16等多字节方案。 This means it is easier to change the character encoding in the future, since the code requires no modification.
这意味着将来更容易更改字符编码,因为代码不需要修改。
private static final String DEFAULT_ENCODING = "ISO-8859-1";
public static final String convert(final InputStream in) throws IOException {
return convert(in, DEFAULT_ENCODING);
}
public static final String convert(final InputStream in, final String encoding) throws IOException {
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final byte[] buf = new byte[2048];
int rd;
while ((rd = in.read(buf, 0, 2048) >= 0) {
out.write(buf, 0, rd);
}
return new String(out.toByteArray(), 0, encoding);
}
I don't see how it could be much simpler. 我不明白它怎么可能简单得多。 I did this a little different once.. if you already have a String, you can do this:
我曾经这样做过一次......如果你已经有了一个String,你可以这样做:
new String(originalString.getBytes(), "ISO-8859-1");
So something like this could also work: 所以这样的事情也可以起作用:
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
return new String(sb.toString().getBytes(), "ISO-8859-1");
EDIT: I should add, this is really just an alternative to your already working solution. 编辑:我应该补充一点,这实际上只是您已经工作的解决方案的替代品。 When it comes to converting Streams in Java it won't be much simpler, so go for it.
当谈到在Java中转换Streams时,它不会简单得多,所以去吧。 :)
:)
If you don't want to plumb it yourself you could have a look at the apache commons io project, IOUtils.toString(InputStream input, String encoding) which seems to do what you want. 如果你不想自己探测它,你可以看看apo commons io项目, IOUtils.toString(InputStream输入,字符串编码) ,这似乎可以做你想要的。 I haven't tried that method myself but the java doc states " Get the contents of an InputStream as a String using the specified character encoding."
我自己没有尝试过该方法,但是java doc声明“ 使用指定的字符编码将InputStream的内容作为String获取。”
I just found out that this answer to the question Read/convert an InputStream to a String can be applied to my problem, please see the code below. 我刚刚发现这个问题的答案 读取/将InputStream转换为String可以应用于我的问题,请参阅下面的代码。 Anyway, I very much appreciate the answers you've given so far.
无论如何,我非常感谢你到目前为止给出的答案。
private String convertStreamToString(InputStream is, String charsetName) {
try {
return new java.util.Scanner(is, charsetName).useDelimiter("\\A").next();
} catch (java.util.NoSuchElementException e) {
return "";
}
}
So in order to encode from Latin-1, call it like this: 所以为了从Latin-1编码,请像这样调用:
String message = convertStreamToString(is, "8859_1");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.