简体繁体 English

Java读取文件，字节流和字符流之间的性能差异

[英]Java-reading a file, performance difference between Byte and Character Streams

原文 2011-10-29 13:08:26 5 3 java

Pretty simple question: what's the performance difference between a Byte Stream and a Character Stream? 很简单的问题：字节流和字符流之间的性能差异是什么？

The reason I ask is because I'm implementing level loading from a file, and initially I decided I would just use a Byte Stream for the purpose, because it's the simplest type, and thus it should perform the best. 我问的原因是因为我正在实现从文件进行级别加载，并且最初我决定仅出于此目的而使用字节流，因为它是最简单的类型，因此应该表现最好。 But then I figured that it might be nice to be able to read and write the level files via a text editor instead of writing a more complex level editor (to start off with). 但是后来我发现，能够通过文本编辑器读取和写入关卡文件而不是编写一个更复杂的关卡编辑器（开始时）可能会很好。 In order for it to be legible by a text editor, I would need to use Character streams instead of Byte streams, so I'm wondering if there's really any performance difference worth mentioning between the two methods? 为了使它在文本编辑器中清晰可见，我将需要使用字符流而不是字节流，因此我想知道这两种方法之间是否确实存在性能差异？ At the moment it doesn't really matter much since level loading is infrequent, but I'd be interested to know for future reference, for instances where I might need to load levels from hard drive on the fly (large levels). 目前，这并不重要，因为级别加载很少，但是我很想知道以供将来参考，例如，在某些情况下，我可能需要从硬盘中动态加载级别（大型级别）。

3 个解决方案

Pretty simple question: what's the performance difference between a Byte Stream and a Character Stream? 很简单的问题：字节流和字符流之间的性能差异是什么？

I assume you are compare Input/OutputStream with Reader/Writer streams. 我假设您正在将Input / OutputStream与Reader / Writer流进行比较。 If that is the case the performance is almost the same. 在这种情况下，性能几乎相同。 Unless you have a very fast drive, the bottleneck will be almost certainly the disk, in which case it doesn't matter too much what you do in Java. 除非您有一个非常快的驱动器，否则瓶颈几乎肯定是磁盘，在这种情况下，您在Java中所做的事情并不重要。

The reason I ask is because I'm implementing level loading from a file, and initially I decided I would just use a Byte Stream for the purpose, because it's the simplest type, and thus it should perform the best. 我问的原因是因为我正在实现从文件进行级别加载，并且最初我决定仅出于此目的而使用字节流，因为它是最简单的类型，因此应该表现最好。 But then I figured that it might be nice to be able to read and write the level files via a text editor instead of writing a more complex level editor (to start off with). 但是后来我发现，能够通过文本编辑器读取和写入关卡文件而不是编写一个更复杂的关卡编辑器（开始时）可能会很好。

All files are actually a stream of bytes. 所有文件实际上都是字节流。 So when you use Reader/Writer it uses an encoder to convert bytes to chars and back again. 因此，当您使用Reader / Writer时，它使用编码器将字节转换为char并再次返回。 There is nothing stopping you reading and writing bytes directly which do exactly the same thing. 没有什么可以阻止您直接读写完全相同的字节。

In order for it to be legible by a text editor, I would need to use Character streams instead of Byte streams, 为了让文字编辑器更清晰易懂，我需要使用字符流而不是字节流，

You wouldn't, but it might make it easier. 您不会，但是这样做可能会更容易。 If you only want ASCII encoding, there is no difference. 如果只需要ASCII编码，则没有区别。 If you want UTF-8 encoding with non-ASCII characters using chars is likely to be simpler. 如果要使用非ASCII字符的UTF-8编码，则使用char可能更简单。

so I'm wondering if there's really any performance difference worth mentioning between the two methods? 所以我想知道这两种方法之间是否确实存在性能差异值得一提？

I would worry about correctness first and performance second. 我会首先担心正确性，然后是性能。

I might need to load levels from hard drive on the fly (large levels). 我可能需要即时从硬盘加载级别（大级别）。

Java can read/write text at about 90 MB/s, most hard drives and networks are not that fast. Java可以大约90 MB / s的速度读/写文本，大多数硬盘和网络都没有那么快。 However if you need to write GBs in second and you have fast SSD, then it might make a difference. 但是，如果您需要秒写入GB，并且具有快速的SSD，则可能会有所不同。 SSDs can perform 500 MB/s or more and then I would suggest you use NIO to maximise performance. SSD可以执行500 MB / s或更高的速度，然后建议您使用NIO来最大化性能。

Java has only one kind of stream: a byte stream. Java只有一种流：字节流。 The class java.io.InputStream and java.io.OutputStream are defined in terms of bytes. 类java.io.InputStream和java.io.OutputStream是按字节定义的。

To convert bytes to characters, and eventually Strings, you will always be using the functionality in java.nio.charset . 要将字节转换为字符，最终转换为字符串，您将始终使用java.nio.charset的功能。 However, for your convenience, Java provides Reader and Writer methods that adapt byte streams into stream-like objects that operate on characters and Strings. 但是，为了方便起见，Java提供了Reader和Writer方法，这些方法将字节流适配为对字符和String进行操作的类流对象。

There is a CPU time cost, of course, in conversion. 当然，转换需要CPU时间。 However, the cost is very low. 但是，成本非常低。 If you manage to write a program that has performance dominated by this cost, you've written a very lean program indeed. 如果您要编写性能受此成本支配的程序，那么您确实已经编写了一个非常精简的程序。

I don't know Java, so take this with a pinch of salt. 我不懂Java，所以要加点盐。

A character stream typically means each thing you read is decoded into an individual character based on the current locale, which means it's important for internationalised text data which can't be represented with just 128 or 256 different choices. 字符流通常意味着您阅读的每件事都会根据当前的语言环境解码为单个字符，这对于国际化的文本数据非常重要，因为不能仅用128或256个不同的选项来表示国际化的文本数据。 The set of all possible characters is defined in the Unicode system and how you get from individual bytes to characters is defined by the encoding. 所有可能的字符集在Unicode系统中定义，如何从单个字节转换为字符由编码定义。 More information here: http://www.joelonsoftware.com/articles/Unicode.html 此处的更多信息： http : //www.joelonsoftware.com/articles/Unicode.html

A byte stream on the other hand just reads in values from 0 to 255 and doesn't try and interpret them as characters from any particular language. 另一方面，字节流仅读取0到255之间的值，并且不会尝试将它们解释为来自任何特定语言的字符。 As such, a byte stream should always be somewhat faster. 这样，字节流应该总是更快一些。 But if you had international characters in there, they'll not display properly unless you know exactly how they were encoded. 但是，如果您有国际字符，除非您确切知道它们的编码方式，否则它们将无法正确显示。

For most purposes, human-readable data can be stored in ASCII, which only uses 7 bits of data per character and gives you 128 different characters. 对于大多数目的，人类可读数据可以以ASCII格式存储，每个字符仅使用7位数据，并为您提供128个不同的字符。 This will be readable by any typical text editor, and since ASCII characters are a subset of Unicode and of the UTF-8 encoding, you can read an ASCII file either as bytes or as UTF-8 characters, and the content will be unchanged. 任何典型的文本编辑器都可以读取该文本，并且由于ASCII字符是Unicode和UTF-8编码的子集，因此您可以将ASCII文件读取为字节或UTF-8字符，并且内容将保持不变。

If you ever need to store binary values for more efficient serialisation (eg. to store the number 123456789 as a 4 byte integer instead of as a 9 byte string) then you'll need to switch to a byte stream, but you also give up human-readability at this point so the issue becomes somewhat irrelevant. 如果您需要存储二进制值以进行更有效的序列化（例如，将数字123456789作为4字节整数而不是9字节字符串存储），则需要切换到字节流，但是您也放弃了在这一点上人类可读性，因此这个问题变得无关紧要。

It's unlikely that the size of the level will ever have much effect on your loading times - a typical hard drive can read well over a hundred megabytes per second. 级别的大小不太可能会对您的加载时间产生很大影响-典型的硬盘驱动器每秒可以读取超过100兆字节的数据。 Code whichever way is easiest for you, and only optimise this later if your profiling shows there is a problem. 以哪种方式编码对您来说都是最简单的，只有在分析表明存在问题时，才稍后对其进行优化。