简体   繁体   English

为什么直接使用print()方法存储数据比将其存储在字符串中然后写入文件要快?

[英]why storing data directly using print() method is faster than storing it in a string and then writing to a file?

Lets consider this scenario: I am reading a file, and then tweaking each line a bit and then storing the data in a new file. 让我们考虑一下这种情况:我正在读取一个文件,然后稍微调整每一行,然后将数据存储在一个新文件中。 Now, I tried two ways to do it: 现在,我尝试了两种方法:

  1. storing the data in a String and then writing it to the target file at the end like this: 将数据存储在字符串中,然后像下面这样将其写入目标文件的末尾:

      InputStream ips = new FileInputStream(file); InputStreamReader ipsr = new InputStreamReader(ips); BufferedReader br = new BufferedReader(ipsr); PrintWriter desFile = new PrintWriter(targetFilePath); String data = ""; while ((line = br.readLine()) != null) { if (line.contains("_Stop_")) continue; String[] s = line.split(";"); String newLine = s[2]; for (int i = 3; i < s.length; i++) { newLine += "," + s[i]; } data+=newLine+"\\n"; } desFile.write(data); desFile.close(); br.close(); 
  2. directly using println() method for PrintWriter as below in the while loop: 在while循环中,直接使用printWriter的println()方法,如下所示:

      while ((line = br.readLine()) != null) { if (line.contains("_Stop_")) continue; String[] s = line.split(";"); String newLine = s[2]; for (int i = 3; i < s.length; i++) { newLine += "," + s[i]; } desFile.println(newLine); } desFile.close(); br.close(); 

The 2nd process is way faster than the 1st one. 第二个过程比第一个更快。 Now, my question is what is happening so different in these two process that it is differing so much by execution time? 现在,我的问题是,在这两个过程中发生了什么差异,以至于执行时间差异如此之大?

Appending to your string will: 附加到您的字符串将:

  1. Allocate memory for a new string 为新字符串分配内存
  2. Copy all data previously copied. 复制以前复制的所有数据。
  3. Copy the data from your new string. 复制新字符串中的数据。

You repeat this process for every single line, meaning that for N lines of output, you copy O(N^2) bytes around. 您对每一行重复此过程,这意味着对于N行输出,您需要复制O(N ^ 2)个字节。

Meanwhile, writing to your PrintWriter will: 同时,写给您的PrintWriter将:

  1. Copy data to the buffer. 将数据复制到缓冲区。
  2. Occasionally flush the buffer. 偶尔冲洗缓冲区。

Meaning that for N lines of output, you copy only O(N) bytes around. 这意味着对于N行输出,您仅复制O(N)个字节。

For one, you're creating an awful lot of new String objects by appending using +=. 一方面,您要通过使用+ =附加大量新的String对象。 I think that'll definitely slow things down. 我认为那肯定会放慢脚步。

Try appending using a StringBuilder sb declared outside of the loop and then calling desFile.write(sb.toString()); 尝试使用在循环外部声明的StringBuilder sb进行附加,然后调用desFile.write(sb.toString());。 and see how that performs. 看看效果如何。

First of all, the two processes aren't producing the same data, since the one that calls println will have line separator characters between the lines whereas the one that builds all the data up in a buffer and writes it all at once will not. 首先,这两个进程不会产生相同的数据,因为调用println进程将在行之间具有行分隔符,而将所有数据建立在缓冲区中并立即写入的进程则不会。

But the reason for the performance difference is probably the enormous number of String and StringBuilder objects you are generating and throwing away, the memory that needs to be allocated to hold the complete file contents in memory, and the time taken by the garbage collector. 但是,性能差异的原因可能是您正在生成并丢弃大量的StringStringBuilder对象,需要分配以保留完整文件内容在内存中的内存以及垃圾回收器所花费的时间。

If you're going to be doing a significant amount of string concatenation, especially in a loop, it is better to create a StringBuilder before the loop and use it to accumulate the results in the loop. 如果要进行大量的字符串连接,尤其是在循环中,最好在循环之前创建一个StringBuilder ,并使用它在循环中累积结果。

However, if you're going to be processing large files, it is probably better to write the output as you go. 但是,如果要处理大文件,则最好随手编写输出。 The memory requirements of your application will be lower, whereas if you build up the entire result in memory, the memory required will be equal to the size of the output file. 您的应用程序的内存需求会更低,而如果您在内存中构建整个结果,则所需的内存将等于输出文件的大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM