简体   繁体   English

使用ConcurrentHashMap的数据不一致

[英]Data inconsistency using ConcurrentHashMap

The count changes for every run for the same set of files. 对于同一组文件,每次运行的计数都会更改。 The following code is still not data consistent. 以下代码仍与数据不一致。 How to make thread safe? 如何使线程安全? Simple word count code. 简单的字数统计代码。

package ConcurrentHashMapDemo;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

class FileReaderTask implements Runnable {
    private String filePath;
    private String fileName;
    private ConcurrentMap<String, Integer> wordCountMap;

    public FileReaderTask(String filePath, String fileName,
            ConcurrentMap<String, Integer> wordCountMap) {
        this.filePath = filePath;
        this.fileName = fileName;
        this.wordCountMap = wordCountMap;
    }

    public void run() {
        File jobFile = new File(filePath + fileName);
        try {
            BufferedReader bReader = new BufferedReader(new FileReader(jobFile));
            String line = "";
            while ((line = bReader.readLine()) != null) {
                String[] strArray = line.split(" ");
                for (String str : strArray) {
                    if (wordCountMap.containsKey(str)) {
                        wordCountMap.replace (str.trim(),
                                wordCountMap.get(str.trim()) + 1);
                    } else {
                        wordCountMap.putIfAbsent(str.trim(), 1);
                    }
                }
            }
            //Thread.sleep(10000);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

public class Main {
    public static void main(String[] args) {
        ConcurrentMap<String, Integer> wordCountMap = new ConcurrentHashMap<String, Integer>();
        File fileDir = new File("c://job_files");
        Thread[] threads = new Thread[fileDir.listFiles().length];
        for(int i=0;i<threads.length;i++){
            FileReaderTask frt = new FileReaderTask("c:/job_files/", fileDir.listFiles()[i].getName(), wordCountMap);
            threads[i]= new Thread(frt);
            threads[i].start();
        }
        //
        for(int i=0;i<threads.length;i++){
        try {
        threads[i].join();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        }

        for(Map.Entry<String, Integer> entry: wordCountMap.entrySet()){
            String key = entry.getKey();
            System.out.println(key +" - - "+wordCountMap.get(key));
        }
        System.out.println("Main");
    }
}

The concurrent containers ensure internal consistency (for example not adding the same key twice), but they do nothing to protect the stored values. 并发容器可确保内部一致性(例如,不会两次添加相同的密钥),但是它们并不能保护存储的值。 Your code as it stands now has a race condition. 您的代码现在处于竞争状态。 Another thread can increment the counter between your call to get and your call to replace . 另一个线程可以在get调用和replace调用之间增加计数器。 The replace then puts the wrong value in the map, losing the increment performed by the other thread. 然后, replace将错误的值放在映射中,从而丢失了另一个线程执行的增量。

You need to make your increment atomic. 您需要使增量原子化。 Something like this, which uses the version of replace which ensures the value in the map is still the same before peforming the replacement: 像这样的东西,它使用replace的版本, replace确保在执行替换之前映射中的值仍然相同:

str = str.trim();
while(true) {
    Integer oldValue = wordCountMap.putIfAbsent(str, 1);
    if(oldValue != null) {
        if(wordCountMap.replace(str, oldValue, oldValue + 1))
          break; // Successfully incremented the existing count
    } else {
        break; // Added new count of 1
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM