简体   繁体   中英

How to handle strings efficiently in java?

There is a compressed file, first I need to decompress it, then read the contents of the line and process each line of data by splitting the two fields and using one of them as the key, then encrypt another field. Some code is as follows:

try (GZIPInputStream stream = new GZIPInputStream(new ByteArrayInputStream(event.getBody()));
     BufferedReader br = new BufferedReader(new InputStreamReader(stream))) {
    String line;
    StringBuilder builder = new StringBuilder();
    while ((line = br.readLine()) != null) {
        builder.append(line);
        this.handleLine(builder);
        builder.setLength(0);
        builder.trimToSize();
    }
} catch (Exception e) {
    // ignore
}
  1. Each compressed package has about three million rows, so how to handle strings efficiently in the loop is the key to the performance of the entire program.
  2. Is it correct to use StringBuilder like this?
  3. The format of each line of data is as follows : aaa|bbb|ccc|ddd|eee|fff|ggg|hhh .

What I want to know is how to correctly use String and StringBuilder in this extremely large amount of data loop.

For handling many individual items in a loop there's basically 2 possible sources of trouble related to memory management:

  1. keeping unnecessary per-item data in memory, thus creating a memory leak
  2. creating large amounts of memory churn by allocating too much memory and/or too many individual objects for each individual item you handle.

Violating #1 would mean that your total memory usage would increase throughout the loop and thus create an upper limit to how many items you can handle.

Violating #2 would " only " cause more garbage collection pauses and not cause your application to fail (ie it'd slow down, but still work).

If you actually need the StringBuilder (as indicated by your comment) then you should get rid of the trimToSize() call (as Stephen C correctly commented), because it will basically force the StringBuilder to re-allocate space for the content of line in each iteration (effectively gaining you very, very little over just plain re-creating the StringBuilder in each iteration).

The only drawback of removing that call is that the memory used by StringBuilder will never be reduced until the loop has finished.

As long as there are no extreme outliers in line length in that file that is probably not a problem.

As an additional side-note: you mention that String.split is too inefficient for you. A major source of that inefficiency is the fact that it needs to re-compile the regular expression every time. If you use pre-compile the pattern outside of the loop using Pattern.compile and then call Pattern.split() inside the loop, then that might already be much quicker.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM