简体   繁体   中英

Data inconsistency using ConcurrentHashMap

The count changes for every run for the same set of files. The following code is still not data consistent. How to make thread safe? Simple word count code.

package ConcurrentHashMapDemo;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

class FileReaderTask implements Runnable {
    private String filePath;
    private String fileName;
    private ConcurrentMap<String, Integer> wordCountMap;

    public FileReaderTask(String filePath, String fileName,
            ConcurrentMap<String, Integer> wordCountMap) {
        this.filePath = filePath;
        this.fileName = fileName;
        this.wordCountMap = wordCountMap;
    }

    public void run() {
        File jobFile = new File(filePath + fileName);
        try {
            BufferedReader bReader = new BufferedReader(new FileReader(jobFile));
            String line = "";
            while ((line = bReader.readLine()) != null) {
                String[] strArray = line.split(" ");
                for (String str : strArray) {
                    if (wordCountMap.containsKey(str)) {
                        wordCountMap.replace (str.trim(),
                                wordCountMap.get(str.trim()) + 1);
                    } else {
                        wordCountMap.putIfAbsent(str.trim(), 1);
                    }
                }
            }
            //Thread.sleep(10000);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

public class Main {
    public static void main(String[] args) {
        ConcurrentMap<String, Integer> wordCountMap = new ConcurrentHashMap<String, Integer>();
        File fileDir = new File("c://job_files");
        Thread[] threads = new Thread[fileDir.listFiles().length];
        for(int i=0;i<threads.length;i++){
            FileReaderTask frt = new FileReaderTask("c:/job_files/", fileDir.listFiles()[i].getName(), wordCountMap);
            threads[i]= new Thread(frt);
            threads[i].start();
        }
        //
        for(int i=0;i<threads.length;i++){
        try {
        threads[i].join();
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        }

        for(Map.Entry<String, Integer> entry: wordCountMap.entrySet()){
            String key = entry.getKey();
            System.out.println(key +" - - "+wordCountMap.get(key));
        }
        System.out.println("Main");
    }
}

The concurrent containers ensure internal consistency (for example not adding the same key twice), but they do nothing to protect the stored values. Your code as it stands now has a race condition. Another thread can increment the counter between your call to get and your call to replace . The replace then puts the wrong value in the map, losing the increment performed by the other thread.

You need to make your increment atomic. Something like this, which uses the version of replace which ensures the value in the map is still the same before peforming the replacement:

str = str.trim();
while(true) {
    Integer oldValue = wordCountMap.putIfAbsent(str, 1);
    if(oldValue != null) {
        if(wordCountMap.replace(str, oldValue, oldValue + 1))
          break; // Successfully incremented the existing count
    } else {
        break; // Added new count of 1
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM