[英]How to handle strings efficiently in java?
There is a compressed file, first I need to decompress it, then read the contents of the line and process each line of data by splitting the two fields and using one of them as the key, then encrypt another field. 有一个压缩文件,首先我需要解压缩它,然后读取行的内容,并通过拆分两个字段并将其中一个用作密钥来处理每一行数据,然后对另一个字段进行加密。 Some code is as follows:
一些代码如下:
try (GZIPInputStream stream = new GZIPInputStream(new ByteArrayInputStream(event.getBody()));
BufferedReader br = new BufferedReader(new InputStreamReader(stream))) {
String line;
StringBuilder builder = new StringBuilder();
while ((line = br.readLine()) != null) {
builder.append(line);
this.handleLine(builder);
builder.setLength(0);
builder.trimToSize();
}
} catch (Exception e) {
// ignore
}
StringBuilder
like this? StringBuilder
是否正确? aaa|bbb|ccc|ddd|eee|fff|ggg|hhh
. aaa|bbb|ccc|ddd|eee|fff|ggg|hhh
。 What I want to know is how to correctly use String
and StringBuilder
in this extremely large amount of data loop. 我想知道的是如何在这种数量巨大的数据循环中正确使用
String
和StringBuilder
。
For handling many individual items in a loop there's basically 2 possible sources of trouble related to memory management: 要在一个循环中处理许多单独的项目,基本上有两种与内存管理有关的麻烦源:
Violating #1 would mean that your total memory usage would increase throughout the loop and thus create an upper limit to how many items you can handle. 违反#1意味着您的总内存使用量将在整个循环中增加,从而为您可以处理的项目数设置了上限。
Violating #2 would " only " cause more garbage collection pauses and not cause your application to fail (ie it'd slow down, but still work). 违反#2 只会 “造成”更多的垃圾回收暂停,而不会导致您的应用程序失败(即,它速度变慢,但仍然可以运行)。
If you actually need the StringBuilder
(as indicated by your comment) then you should get rid of the trimToSize()
call (as Stephen C correctly commented), because it will basically force the StringBuilder
to re-allocate space for the content of line
in each iteration (effectively gaining you very, very little over just plain re-creating the StringBuilder
in each iteration). 如果您实际上需要
StringBuilder
(如您的注释所示),则应该摆脱trimToSize()
调用(正如Stephen C正确注释的那样),因为它基本上会强制StringBuilder
为中的line
内容重新分配空间。每次迭代(仅在每次迭代中简单地重新创建StringBuilder
有效地使您StringBuilder
)。
The only drawback of removing that call is that the memory used by StringBuilder
will never be reduced until the loop has finished. 删除该调用的唯一缺点是
StringBuilder
使用的内存永远不会减少,直到循环完成为止。
As long as there are no extreme outliers in line length in that file that is probably not a problem. 只要该文件中的行长没有极端的异常,那可能就不是问题。
As an additional side-note: you mention that String.split
is too inefficient for you. 作为一个补充说明:您提到
String.split
对您来说效率太低。 A major source of that inefficiency is the fact that it needs to re-compile the regular expression every time. 效率低下的一个主要原因是它每次都需要重新编译正则表达式。 If you use pre-compile the pattern outside of the loop using
Pattern.compile
and then call Pattern.split()
inside the loop, then that might already be much quicker. 如果您使用
Pattern.compile
在循环外部使用预编译模式,然后在循环内部调用Pattern.split()
,则可能已经快得多了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.