简体   繁体   中英

ConcurrentHashMap Values not updating

I am trying to create a simple multithreaded dictionary/index using a group of Documents which contain words. The dictionary is stored in a ConcurrentHashMap with String keys and Vector values. For each word in the dictionary there is an appearance list which is a vector with a series of Tuple objects (custom object).( Tuple is a combination of 2 numbers in my case).

Each thread takes one document as input, finds all the words in it and tries to update the ConcurrentHashMap. Also, i have to point out that 2 threads may try to update the same key of the Map by adding on its value, a new Tuple. I only do write operations on the Vector.

Below you can see the code for submitting new threads. As you can see i give as input the dictionary which is a ConcurrentHashMap with String keys and Vector values

public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources, 
                                                          crawler.getFinishedPages().take()));
        }
        this.INDEXING_SERVICE.shutdown();
}

Below you can see the code of and indexing thread :

public class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
    }
}

The code seems to be correct but the dictionary is not updated properly. I mean some words (keys) are missing from the original dictionary and some other keys have less items in their Vector.

I have done some debugging and i found out that before a thread instance is terminated, it has calculated the correct keys and values. Though the original dictionary which is given in the thread as input (look on the first piece of code) is not updated correctly.Do you have any idea or suggestion?

when you call this.INDEXING_SERVICE.shutdown() may 'IndexingTask' has not run yet, I updated your code:

import java.util.Arrays;
import java.util.List;
import java.util.Vector;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

class Tuple {
    private Integer key;
    private String value;

    public Tuple(Integer key, String value) {
        this.key = key;
        this.value = value;
    }

    @Override
    public String toString() {
        return "(" + key + ", " + value + ")";
    }
}

class HtmlDocument {

    private int docId;
    private String URL;
    private List<String> terms;

    public int getDocId() {
        return docId;
    }

    public void setDocId(int docId) {
        this.docId = docId;
    }

    public String getURL() {
        return URL;
    }

    public void setURL(String URL) {
        this.URL = URL;
    }

    public List<String> getTerms() {
        return terms;
    }

    public void setTerms(List<String> terms) {
        this.terms = terms;
    }

    public String getWordFrequency(String word) {
        return "query";
    }
}

class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
        Crawler.RUNNING_TASKS.decrementAndGet();
    }
}

class Crawler {

    protected BlockingQueue<HtmlDocument> finishedPages = new LinkedBlockingQueue<>();

    public static final AtomicInteger RUNNING_TASKS = new AtomicInteger();

    public BlockingQueue<HtmlDocument> getFinishedPages() {
        return finishedPages;
    }
}

public class ConcurrentHashMapExample {

    private ConcurrentHashMap<Integer, String> sources = new ConcurrentHashMap<>();
    private ConcurrentHashMap<String, Vector<Tuple>> dictionary = new ConcurrentHashMap<>();

    private static final ExecutorService INDEXING_SERVICE = Executors.newSingleThreadExecutor();

    public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            Crawler.RUNNING_TASKS.incrementAndGet();
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources,
                    crawler.getFinishedPages().take()));
        }
        //when you call ```this.INDEXING_SERVICE.shutdown()``` may 'IndexingTask' has not run yet
        while (Crawler.RUNNING_TASKS.get() > 0)
            Thread.sleep(3);
        this.INDEXING_SERVICE.shutdown();
    }

    public ConcurrentHashMap<Integer, String> getSources() {
        return sources;
    }

    public ConcurrentHashMap<String, Vector<Tuple>> getDictionary() {
        return dictionary;
    }

    public static void main(String[] args) throws Exception {
        ConcurrentHashMapExample example = new ConcurrentHashMapExample();
        Crawler crawler = new Crawler();
        HtmlDocument document = new HtmlDocument();
        document.setDocId(1);
        document.setURL("http://127.0.0.1/abc");
        document.setTerms(Arrays.asList("hello", "world"));
        crawler.getFinishedPages().add(document);
        example.run(crawler);
        System.out.println("source: " + example.getSources());
        System.out.println("dictionary: " + example.getDictionary());
    }

}

output:

source: {1=http://127.0.0.1/abc}
dictionary: {world=[(1, query)], hello=[(1, query)]}

I think, in your business, you should use the 'Producer', 'Consumer' design pattern

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM