简体   繁体   中英

What is the performance impact of interning all strings in Java?

I am working on a trade processing application where I have to deal with a lot of strings. Some of those strings are non-repeating such as a Trade ID whereas others repeat frequently such as Product ID.

I am considering interning all trade attributes as a generic step while parsing the Trade message (JSON) to reduce the memory usage and speed up equality checks.

My question is whether I might unintentionally degrade performance with this move?

Deduplicating common strings is usually a good idea to save memory.
But never use String.intern for deduplication!

  • String.intern is a native method; each call suffers from additional JNI overhead .
  • It blows internal hashtable which is shared among all JVM parts (eg class loading).
  • The default capacity of string table is not large enough, and the number of buckets is constant.
  • It may increase GC pauses since JVM scans this internal hashtable and possibly rehashes it during stop-the-world phase.
  • More details in this presentation .

A regular HashMap or ConcurrentHashMap can be a on order of magnitude better for this task.

The following benchmark compares the performance of String.intern to [Concurrent]HashMap.putIfAbsent on the set of 1M strings:

@State(Scope.Benchmark)
public class Dedup {
    private static final HashMap<String, String> HM = new HashMap<>();
    private static final ConcurrentHashMap<String, String> CHM = new ConcurrentHashMap<>();

    private static final int SIZE = 1024 * 1024;
    private static final String[] STRINGS = new Random(0).ints(SIZE)
            .mapToObj(Integer::toString)
            .toArray(String[]::new);

    int idx;

    @Benchmark
    public String intern() {
        String s = nextString();
        return s.intern();
    }

    @Benchmark
    public String hashMap() {
        String s = nextString();
        String prev = HM.putIfAbsent(s, s);
        return prev != null ? prev : s;
    }

    @Benchmark
    public String concurrentHashMap() {
        String s = nextString();
        String prev = CHM.putIfAbsent(s, s);
        return prev != null ? prev : s;
    }

    private String nextString() {
        return STRINGS[++idx & (SIZE - 1)];
    }
}

The results on JDK 9 (smaller is better):

Benchmark                Mode  Cnt    Score    Error  Units
Dedup.concurrentHashMap  avgt   10   91,208 ±  0,569  ns/op
Dedup.hashMap            avgt   10   73,917 ±  0,602  ns/op
Dedup.intern             avgt   10  832,700 ± 73,402  ns/op

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM