将ASCII byte []转换为String

Question

I am trying to pass a byte[] containing ASCII characters to log4j, to be logged into a file using the obvious representation. 我试图将包含ASCII字符的byte []传递给log4j，使用明显的表示法登录到文件中。 When I simply pass in the byt[] it is of course treated as an object and the logs are pretty useless. 当我简单地传入byt []时，它当然被视为一个对象，并且日志非常无用。 When I try to convert them to strings using new String(byte[] data) , the performance of my application is halved. 当我尝试使用new String(byte[] data)将它们转换为字符串时，我的应用程序的性能减半。

How can I efficiently pass them in, without incurring the approximately 30us time penalty of converting them to strings. 如何有效地传递它们，而不会导致将它们转换为字符串的大约30us时间代价。

Also, why does it take so long to convert them? 另外，为什么转换它们需要这么长时间？

Thanks. 谢谢。

Edit 编辑

I should add that I am optmising for latency here - and yes, 30us does make a difference! 我应该补充一点，我在这里选择延迟 - 是的，30us确实有所作为！ Also, these arrays vary from ~100 all the way up to a few thousand bytes. 而且，这些数组从~100一直到几千字节不等。

Answer 1

ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually: ASCII是少数可以转换为UTF16 /从UTF16转换而无需算术或表查找的编码之一，因此可以手动转换：

String convert(byte[] data) {
    StringBuilder sb = new StringBuilder(data.length);
    for (int i = 0; i < data.length; ++ i) {
        if (data[i] < 0) throw new IllegalArgumentException();
        sb.append((char) data[i]);
    }
    return sb.toString();
}

But make sure it really is ASCII, or you'll end up with garbage. 但要确保它确实是 ASCII，否则你最终会变成垃圾。

Answer 2

What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. 你想要做的是延迟处理byte []数组，直到log4j确定它实际上想要记录消息。 This way you could log it at DEBUG level, for example, while testing and then disable it during production. 这样，您可以在DEBUG级别将其记录，例如，在测试期间，然后在生产期间禁用它。 For example, you could: 例如，您可以：

final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
    @Override public String toString() {
        return new String(myArray);
    }
});

Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message! 现在你不支付速度惩罚，除非你实际记录数据，因为在log4j决定它实际上会记录消息之前不会调用toString方法！

Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. 现在我不确定“明显的表示”是什么意思所以我假设您的意思是通过将字节重新解释为默认字符编码来转换为String。 Now if you are dealing with binary data, this is obviously worthless. 现在，如果你正在处理二进制数据，这显然是毫无价值的。 In that case I'd suggest using Arrays.toString(byte[]) to create a formatted string along the lines of 在这种情况下，我建议使用Arrays.toString（byte []）来创建一个格式化的字符串

[54, 23, 65, ...]

Answer 3

If your data is in fact ASCII (ie 7-bit data), then you should be using new String(data, "US-ASCII") instead of depending on the platform default encoding. 如果您的数据实际上是ASCII（即7位数据），那么您应该使用new String(data, "US-ASCII")而不是依赖于平台默认编码。 This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection). 这可能比尝试将其解释为您的平台默认编码（可能是UTF-8，需要更多内省）更快。

You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charset instance and calling new String(data, charset) instead. 您还可以通过缓存Charset实例并调用new String(data, charset)来避免每次Charset-Lookup命中来加快速度。

Having said that: it's been a very, very long time since I've seen real ASCII data in production environment 话虽如此：自从我在生产环境中看到真正的ASCII数据以来，已经很长很长时间了

Answer 4

看看这里：更快的新字符串（字节，cs / csn）和String.getBytes（cs / csn）

Answer 5

Halved performance? 表现减半？ How large is this byte array? 这个字节数组有多大？ If it's for example 1MB, then there are certainly more factors to take into account than just "converting" from bytes to chars (which is supposed to be fast enough though). 如果它是例如1MB，那么肯定有更多的因素需要考虑，而不仅仅是从字节“转换”到字符（虽然它应该足够快）。 Writing 1MB of data instead of "just" 100bytes (which the byte[].toString() may generate) to a log file is obviously going to take some time. 将 1MB数据而不是“仅”100 byte[].toString() （ byte[].toString() 。toString（ byte[].toString()可能生成） 写入日志文件显然需要一些时间。 The disk file system is not as fast as RAM memory. 磁盘文件系统没有RAM内存快。

You'll need to change the string representation of the byte array. 您需要更改字节数组的字符串表示形式。 Maybe with some more sensitive information, eg the name associated with it (filename?), its length and so on. 也许有一些更敏感的信息，例如与之关联的名称（文件名？），其长度等等。 After all, what does that byte array actually represent? 毕竟，该字节数组实际代表什么？

Edit : I can't remember to have seen the "approximately 30us" phrase in your question, maybe you edited it in within 5 minutes after asking, but this is actually microoptimization and it should certainly not cause "halved performance" in general. 编辑：我不记得在你的问题中看过“大约30us”的短语，也许你在询问后的5分钟内编辑了它，但这实际上是微观优化，它一定不会导致“减半的表现”。 Unless you write them a million times per second (still then, why would you want to do that? aren't you overusing the phenomenon "logging"?). 除非你每秒写入一百万次（那么，你为什么要这样做呢？难道你没有过度使用“记录”现象吗？）。

将ASCII byte []转换为String

问题描述

5 个解决方案

解决方案1
17 2010-02-04 18:00:21

解决方案2
14 已采纳 2010-02-04 17:59:38

解决方案3
8 2010-02-04 18:01:24

解决方案4
1 2010-02-04 17:58:42

解决方案5
1 2010-02-04 18:01:52

将ASCII byte []转换为String

问题描述

5 个解决方案

解决方案1 17 2010-02-04 18:00:21

解决方案2 14 已采纳 2010-02-04 17:59:38

解决方案3 8 2010-02-04 18:01:24

解决方案4 1 2010-02-04 17:58:42

解决方案5 1 2010-02-04 18:01:52

解决方案1
17 2010-02-04 18:00:21

解决方案2
14 已采纳 2010-02-04 17:59:38

解决方案3
8 2010-02-04 18:01:24

解决方案4
1 2010-02-04 17:58:42

解决方案5
1 2010-02-04 18:01:52