Java 并行流比串行慢

Question

I have a database record of around 1000000 paragraphs with around ~500 characters each.我有大约 1000000 个段落的数据库记录，每个段落大约 500 个字符。 By reading all the records, I need to get the list of alphabet ordered by most to least used.通过阅读所有记录，我需要得到按使用最多到最少使用顺序排列的字母列表。

I mock the database reading by creating stream up to 1000000 then process the stream in parallel我通过创建高达 1000000 的流来模拟数据库读取，然后并行处理流

final Map<Character, Long> charCountMap = new ConcurrentHashMap<>();
for (char c = 'a'; c <= 'z'; c++) {
    charCountMap.put(c, 0l);
}

System.out.println("Parallel Stream");
long start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000).parallel() //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1))); //update ConcurrentHashMap

long end = System.currentTimeMillis();
System.out.println("Parallel Stream time spent :" + (end - start));

System.out.println("Serial Stream"); start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000) //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1)));
end = System.currentTimeMillis();
System.out.println("Serial Stream time spent :" + (end - start));

I initially thought that parallel stream would be faster even with expected overhead for stream larger than 100,000.我最初认为并行流会更快，即使流大于 100,000 的预期开销也是如此。 However, test shows that serial stream is ~5X faster than parallel even for 1,000,000 records.但是，测试表明，即使对于 1,000,000 条记录，串行流也比并行快约 5 倍。

I suspected it was because of updating the ConcurrentHashMap.我怀疑是因为更新了 ConcurrentHashMap。 But when I removed it and change with empty function, there is still significant performance gap.但是当我删除它并用空函数更改时，仍然存在显着的性能差距。

Is there something wrong in my database mock up call or the way I use parallel stream?我的数据库模拟调用或我使用并行流的方式有问题吗？

Answer 1

Using RandomStringUtils.randomAlphanumeric(500) isn't suitable for use with parallel() as according to the code here it uses a static variable for the random string generation.使用RandomStringUtils.randomAlphanumeric(500)不适合与parallel()一起使用，因为根据此处的代码，它使用静态变量来生成随机字符串。 Therefore all the calls from all the threads to generate a random string will have contention on the same underlying instance of Random :因此，来自所有线程的所有生成随机字符串的调用都将在Random的同一底层实例上发生争用：

private static final Random RANDOM = new Random();

Write your own random string generator which uses single instance of Random per thread or has use of java.util.concurrent.ThreadLocalRandom - this avoids contention on the random sequences.编写您自己的随机字符串生成器，该生成器使用每个线程的单个Random实例或使用java.util.concurrent.ThreadLocalRandom - 这避免了随机序列的争用。 The same issue causes poor performance in this question before it was edited to use ThreadLocalRandom . 在将其编辑为使用 ThreadLocalRandom 之前，相同的问题会导致此问题的性能不佳。

See javadoc for java.util.Random says:请参阅java.util.Random的 javadoc 说：

Instances of java.util.Random  are threadsafe.
However, the concurrent use of the same java.util.Random 
instance across threads may encounter contention and consequent
poor performance. Consider instead using 
java.util.concurrent.ThreadLocalRandom in multithreaded
designs.

Java 并行流比串行慢

问题描述

1 个解决方案

解决方案1
0 2022-06-15 17:32:27

Java 并行流比串行慢

问题描述

1 个解决方案

解决方案1 0 2022-06-15 17:32:27

解决方案1
0 2022-06-15 17:32:27