简体   繁体   English

java 8并行流混淆/问题

[英]java 8 parallel stream confusion/issue

I am new to parallel stream and trying to make 1 sample program that will calculate value * 100(1 to 100) and store it in map. 我是并行流的新手,并试图制作一个计算值* 100(1到100)并将其存储在地图中的示例程序。 While executing code I am getting different count on each iteration. 在执行代码时,每次迭代都会有不同的数量。 I may be wrong at somewhere so please guide me anyone knows the proper way to do so. 我可能在某个地方错了所以请指导我,任何人都知道正确的方法。

code : 代码

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.stream.Collectors;

public class Main{    
    static int l = 0;       
    public static void main (String[] args) throws java.lang.Exception {
        letsGoParallel();
    }       
    public static int makeSomeMagic(int data) {
        l++;
        return data * 100;
    }        
    public static void letsGoParallel() {
        List<Integer> dataList = new ArrayList<>();
        for(int i = 1; i <= 100 ; i++) {
            dataList.add(i);
        }
        Map<Integer, Integer> resultMap = new HashMap<>();
        dataList.parallelStream().map(f -> {
            Integer xx = 0;
            {
                xx = makeSomeMagic(f);
            }
            resultMap.put(f, xx);
            return 0;
        }).collect(Collectors.toList());
        System.out.println("Input Size: " + dataList.size());
        System.out.println("Size: " + resultMap.size());
        System.out.println("Function Called: " + l);
    }
}

Runnable Code 可运行代码

Last Output 最后输出

Input Size: 100 输入大小:100

Size: 100 尺寸:100

Function Called: 98 功能称为:98

On each time run output differs. 每次运行输出都不同。 I want to use parallel stream in my own application but due to this confusion/issue I can't. 我想在我自己的应用程序中使用并行流,但由于这种混淆/问题我不能。 In my application I have 100-200 unique numbers on which some same operation needs to be performed. 在我的应用程序中,我有100-200个唯一编号,需要执行相同的操作。 In short there's function which process something. 简而言之,它有处理某些东西的功能。

Your access to both the HashMap and to the l variable are both not thread safe, which is why the output is different in each run. 您对HashMapl变量的访问都不是线程安全的,这就是每次运行时输出不同的原因。

The correct way to do what you are trying to do is collecting the Stream elements into a Map : 执行您要执行的操作的正确方法是将Stream元素收集到Map

Map<Integer, Integer> resultMap =
    dataList.parallelStream()
            .collect(Collectors.toMap (Function.identity (), Main::makeSomeMagic));

EDIT: The l variable is still updated in a not thread safe way with this code, so you'll have to add your own thread safety if the final value of the variable is important to you. 编辑:使用此代码仍然以线程安全的方式更新l变量,因此如果变量的最终值对您很重要,则必须添加自己的线程安全性。

By putting some values in resultMap you're using a side-effect : 通过在resultMap放置一些值,您将使用副作用

 dataList.parallelStream().map(f -> {
            Integer xx = 0;
            {
                xx = makeSomeMagic(f);
            }
            resultMap.put(f, xx);
            return 0;
        })

The API states: API声明:

Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. 无状态操作(例如过滤器和映射)在处理新元素时不保留先前看到的元素的状态 - 每个元素都可以独立于其他元素上的操作进行处理。

Going on with : 事情

Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful. 如果流操作的行为参数是有状态的,则流管道结果可能是不确定的或不正确的。 A stateful lambda (or other object implementing the appropriate functional interface) is one whose result depends on any state which might change during the execution of the stream pipeline. 有状态lambda(或实现适当功能接口的其他对象)的结果取决于在流管道执行期间可能发生变化的任何状态。

It follows an example similar to yours showing: 它遵循一个类似于你的例子显示:

... if the mapping operation is performed in parallel, the results for the same input could vary from run to run, due to thread scheduling differences, whereas, with a stateless lambda expression the results would always be the same. ...如果映射操作是并行执行的,由于线程调度的差异,相同输入的结果可能因运行而异,而对于无状态lambda表达式,结果将始终相同。

That explains your observation: On each time run output differs. 这解释了你的观察: 每次运行输出都不同。

The right approach is shown by @Eran @Eran 显示了正确的方法

Hopefully it works fine. 希望它工作正常。 by making Synchronied function makeSomeMagic and using Threadsafe data structure ConcurrentHashMap and write simple statement 通过制作Synchronied函数makeSomeMagic并使用Threadsafe数据结构ConcurrentHashMap并编写简单语句

dataList.parallelStream().forEach(f -> resultMap.put(f, makeSomeMagic(f)));

Whole code is here : 整个代码在这里:

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.stream.Collectors;

public class Main{  
static int l = 0;
  public static void main (String[] args) throws java.lang.Exception {
    letsGoParallel();
  }
  public synchronized static int makeSomeMagic( int data) { // make it synchonized
    l++;
    return data * 100;
  }
  public static void letsGoParallel() {
    List<Integer> dataList = new ArrayList<>();
    for(int i = 1; i <= 100 ; i++) {
      dataList.add(i);
    }
    Map<Integer, Integer> resultMap = new ConcurrentHashMap<>();// use ConcurrentHashMap
    dataList.parallelStream().forEach(f -> resultMap.put(f, makeSomeMagic(f)));
    System.out.println("Input Size: " + dataList.size());
    System.out.println("Size: " + resultMap.size());
    System.out.println("Function Called: " + l);
  }
}
  • There is no need to count how many times the method invoked. 无需计算方法调用的次数。
  • Stream will help you do loop in byte code. Stream将帮助您循环使用字节代码。
  • Pass your logic(function) to Stream , do not use no thread-safe variable in multi-thread(include parallelStream ) 将您的逻辑(函数)传递给Stream ,不要在多线程中使用没有线程安全的变量(包括parallelStream

like this. 像这样。

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class ParallelStreamClient {
//  static int l = 0;---> no need to count times.

    public static void main(String[] args) throws java.lang.Exception {
        letsGoParallel();
    }

    public static int makeSomeMagic(int data) {
//  l++;-----> this is no thread-safe way
    return data * 100;
}

public static void letsGoParallel() {
    List<Integer> dataList = new ArrayList<>();
    for (int i = 1; i <= 100; i++) {
        dataList.add(i);
    }
    Map<Integer, Integer> resultMap =         
    dataList.parallelStream().collect(Collectors.toMap(i -> i,ParallelStreamClient::makeSomeMagic));
    System.out.println("Input Size: " + dataList.size());
    System.out.println("Size: " + resultMap.size());
    //System.out.println("Function Called: " + l);       
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM