java并发写入集合，然后读取-结果不一致

Question

I read from here that there are several different thread-safe options of Set. 我从这里读到，有几个不同的Set线程安全选项。 In my application, I have 10 threads concurrently adding things to one collection (does not have to be set, but better). 在我的应用程序中，我有10个线程同时将内容添加到一个集合中（不必设置，但更好）。 After all threads finishes, I need to iterate through the collection. 所有线程完成后，我需要遍历集合。

I read that ConcurrentSkipListSet and Collections.newSetFromMap(new ConcurrentHashMap()) both have inconsistent batch operations (addAll, removeAll, etc.) and Iterators. 我读到ConcurrentSkipListSet和Collections.newSetFromMap（new ConcurrentHashMap（））都具有不一致的批处理操作（addAll，removeAll等）和迭代器。 My experiment also confirms this. 我的实验也证实了这一点。 When I use ConcurrentSkipListSet, after adding by all the threads, the reading is a bit random. 当我使用ConcurrentSkipListSet时，在所有线程相加之后，读数有些随机。 I get randomly different size of the set. 我随机得到了不同大小的集合。

I then tried Collections.synchronizedSet(new HashSet<>()), which I suppose should be thread safe as it is blocking multiple write access at the same time. 然后，我尝试了Collections.synchronizedSet（new HashSet <>（）），我认为它应该是线程安全的，因为它同时阻止了多个写访问。 Howvever, it seems it has the same inconsistent reading issue. 但是，似乎有相同的阅读不一致问题。 I still randomly get different sizes in the resulting set. 我仍然在结果集中随机获得不同的大小。

What should I do to make sure the reading is consistent? 我应该怎么做才能确保读数一致？ As said I do not have to use Set. 如前所述，我不必使用Set。 I can use List, or others, as long as there is a way to avoid duplicate adding 只要可以避免重复添加，我就可以使用列表或其他列表

It's abit difficult to show the code as it is part of a very large package. 由于代码是非常大的软件包的一部分，因此显示代码有些困难。 But in general it looks like this 但总的来说看起来像这样

public class MyRecursiveTask extends RecursiveTask<Integer> {
    private List<String> tasks; 
    protected ConcurrentSkipListSet<String> dictionary;
    public MyRecursiveTask(ConcurrentSkipListSet<String> dictionary,
                           List<String> tasks){
        this.dictionary=dictionary;
        this.tasks=tasks;
    }

    protected Integer compute() {
        if (this.tasks.size() > 100) {
            List<RecursiveFeatureExtractor> subtasks =
                new ArrayList<>();
            subtasks.addAll(createSubtasks());
            int count=0;
            for (MyRecursiveTask subtask : subtasks)
                subtask.fork();
            for (MyRecursiveTask subtask : subtasks)
                count+=subtask.join();
            return count;
        } else {
            int count=0;
            for (File task: tasks) {
                    // code to process task
                 String outcome = [method to do some task]
                 dictionary.add(outcome);
                 count++;
            }
            return count;
        }
    }

    private List<MyRecursiveTask> createSubtasks() {
        List<MyRecursiveTask> subtasks =
            new ArrayList<>();

        int total = tasks.size() / 2;
        List<File> tasks1= new ArrayList<>();
        for (int i = 0; i < total; i++)
            tasks1.add(tasks.get(i));
        MyRecursiveTask subtask1 = new MyRecursiveTask(
            dictionary, tasks1);

        List<File> tasks2= new ArrayList<>();
        for (int i = total; i < tasks.size(); i++)
            tasks2.add(tasks.get(i));
        MyRecursiveTask subtask2 = new MyRecursiveTask(
            dictionary, tasks2);

        subtasks.add(subtask1);
        subtasks.add(subtask2);

        return subtasks;
    }
}

Then the code that creates a list of such threaded workers: 然后，代码创建此类线程工人的列表：

....
List<String> allTasks = new ArrayList<String>(100000);
....
//code to fill in "allTasks"
....

ConcurrentSkipListSet<String> dictionary = new ConcurrentSkipListSet<>();
//I also tried "dictionary = Collections.Collections.synchronizedSet(new 
//HashSet<>())" and changed other bits of code accordingly. 
ForkJoinPool forkJoinPool = new ForkJoinPool(10);
MyRecursiveTask mrt = new MyRecursiveTask (dictionary,
            );
int total= forkJoinPool.invoke(mrt);
System.out.println(dictionary.size()); //this value is a bit random. If real     
//size should be 999, when I run the code once i may get 989; second i may 
//get 999; third I may get 990 etc....

thanks 谢谢

Answer 1

Without seeing the code, hard to tell what is wrong. 不看代码，很难说出问题所在。 I would guess that the thread, which reads the result runs too early while some threads are still writing. 我猜想，读取结果的线程在某些线程仍在编写时运行得还为时过早。 Use Thread.join to wait for writers. 使用Thread.join等待作者。 Collections.synchronizedSet is thread safe surely. Collections.synchronizedSet当然是线程安全的。

Consider this from the Javadoc : 考虑一下Javadoc ：

It is imperative that the user manually synchronize on the returned set when iterating over it: 当用户遍历返回的集合时，必须手动对其进行同步：

   Set s = Collections.synchronizedSet(new HashSet());
       ...   synchronized (s) {
       Iterator i = s.iterator(); // Must be in the synchronized block
       while (i.hasNext())
           foo(i.next());   }

Failure to follow this advice may result in non-deterministic behavior. 不遵循此建议可能导致不确定的行为。 The returned set will be serializable if the specified set is serializable. 如果指定的集合是可序列化的，则返回的集合将是可序列化的。

java并发写入集合，然后读取-结果不一致

问题描述

1 个解决方案

解决方案1
1 2015-08-10 20:08:24

java并发写入集合，然后读取-结果不一致

问题描述

1 个解决方案

解决方案1 1 2015-08-10 20:08:24

解决方案1
1 2015-08-10 20:08:24