简体   繁体   English

Java 并行流比串行慢

[英]Java Parallel Stream Slower than Serial

I have a database record of around 1000000 paragraphs with around ~500 characters each.我有大约 1000000 个段落的数据库记录,每个段落大约 500 个字符。 By reading all the records, I need to get the list of alphabet ordered by most to least used.通过阅读所有记录,我需要得到按使用最多到最少使用顺序排列的字母列表。

I mock the database reading by creating stream up to 1000000 then process the stream in parallel我通过创建高达 1000000 的流来模拟数据库读取,然后并行处理流

final Map<Character, Long> charCountMap = new ConcurrentHashMap<>();
for (char c = 'a'; c <= 'z'; c++) {
    charCountMap.put(c, 0l);

System.out.println("Parallel Stream");
long start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000).parallel() //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1))); //update ConcurrentHashMap

long end = System.currentTimeMillis();
System.out.println("Parallel Stream time spent :" + (end - start));

System.out.println("Serial Stream"); start = System.currentTimeMillis();
Stream.iterate(0, i -> i).limit(1000000) //mock database stream
    .forEach(i-> RandomStringUtils.randomAlphanumeric(500)
    .toLowerCase().chars().mapToObj(c -> Character.valueOf((char) c)).filter(c -> c >= 97 && c <= 122)
    .forEach(c -> charCountMap.compute(c, (k, v) -> v + 1)));
end = System.currentTimeMillis();
System.out.println("Serial Stream time spent :" + (end - start));

I initially thought that parallel stream would be faster even with expected overhead for stream larger than 100,000.我最初认为并行流会更快,即使流大于 100,000 的预期开销也是如此。 However, test shows that serial stream is ~5X faster than parallel even for 1,000,000 records.但是,测试表明,即使对于 1,000,000 条记录,串行流也比并行快约 5 倍。

I suspected it was because of updating the ConcurrentHashMap.我怀疑是因为更新了 ConcurrentHashMap。 But when I removed it and change with empty function, there is still significant performance gap.但是当我删除它并用空函数更改时,仍然存在显着的性能差距。

Is there something wrong in my database mock up call or the way I use parallel stream?我的数据库模拟调用或我使用并行流的方式有问题吗?

Using RandomStringUtils.randomAlphanumeric(500) isn't suitable for use with parallel() as according to the code here it uses a static variable for the random string generation.使用RandomStringUtils.randomAlphanumeric(500)不适合与parallel()一起使用,因为根据此处的代码,它使用静态变量来生成随机字符串。 Therefore all the calls from all the threads to generate a random string will have contention on the same underlying instance of Random :因此,来自所有线程的所有生成随机字符串的调用都将在Random的同一底层实例上发生争用:

private static final Random RANDOM = new Random();

Write your own random string generator which uses single instance of Random per thread or has use of java.util.concurrent.ThreadLocalRandom - this avoids contention on the random sequences.编写您自己的随机字符串生成器,该生成器使用每个线程的单个Random实例或使用java.util.concurrent.ThreadLocalRandom - 这避免了随机序列的争用。 The same issue causes poor performance in this question before it was edited to use ThreadLocalRandom . 在将其编辑为使用 ThreadLocalRandom 之前,相同的问题会导致此问题的性能不佳。

See javadoc for java.util.Random says:请参阅java.util.Random的 javadoc 说:

Instances of java.util.Random  are threadsafe.
However, the concurrent use of the same java.util.Random 
instance across threads may encounter contention and consequent
poor performance. Consider instead using 
java.util.concurrent.ThreadLocalRandom in multithreaded

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM