简体   繁体   English

ConcurrentHashMap 值未更新

[英]ConcurrentHashMap Values not updating

I am trying to create a simple multithreaded dictionary/index using a group of Documents which contain words.我正在尝试使用一组包含单词的文档创建一个简单的多线程字典/索引。 The dictionary is stored in a ConcurrentHashMap with String keys and Vector values.字典存储在 ConcurrentHashMap 中,带有 String 键和 Vector 值。 For each word in the dictionary there is an appearance list which is a vector with a series of Tuple objects (custom object).( Tuple is a combination of 2 numbers in my case).对于字典中的每个单词,都有一个外观列表,它是一个带有一系列 Tuple 对象(自定义对象)的向量。(在我的例子中,Tuple 是 2 个数字的组合)。

Each thread takes one document as input, finds all the words in it and tries to update the ConcurrentHashMap.每个线程将一个文档作为输入,查找其中的所有单词并尝试更新 ConcurrentHashMap。 Also, i have to point out that 2 threads may try to update the same key of the Map by adding on its value, a new Tuple.另外,我必须指出,2 个线程可能会尝试通过添加新的元组值来更新 Map 的相同键。 I only do write operations on the Vector.我只在 Vector 上做写操作。

Below you can see the code for submitting new threads.您可以在下面看到提交新线程的代码。 As you can see i give as input the dictionary which is a ConcurrentHashMap with String keys and Vector values正如你所看到的,我将字典作为输入,它是一个带有字符串键和向量值的 ConcurrentHashMap

public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources, 
                                                          crawler.getFinishedPages().take()));
        }
        this.INDEXING_SERVICE.shutdown();
}

Below you can see the code of and indexing thread :您可以在下面看到索引线程的代码:

public class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
    }
}

The code seems to be correct but the dictionary is not updated properly.代码似乎是正确的,但字典没有正确更新。 I mean some words (keys) are missing from the original dictionary and some other keys have less items in their Vector.我的意思是原始字典中缺少一些单词(键),而其他一些键在其 Vector 中的项目较少。

I have done some debugging and i found out that before a thread instance is terminated, it has calculated the correct keys and values.我做了一些调试,我发现在线程实例终止之前,它已经计算了正确的键和值。 Though the original dictionary which is given in the thread as input (look on the first piece of code) is not updated correctly.Do you have any idea or suggestion?虽然线程中作为输入给出的原始字典(查看第一段代码)没有正确更新。您有什么想法或建议吗?

when you call this.INDEXING_SERVICE.shutdown() may 'IndexingTask' has not run yet, I updated your code:当您调用this.INDEXING_SERVICE.shutdown()可能 'IndexingTask' 尚未运行,我更新了您的代码:

import java.util.Arrays;
import java.util.List;
import java.util.Vector;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;

class Tuple {
    private Integer key;
    private String value;

    public Tuple(Integer key, String value) {
        this.key = key;
        this.value = value;
    }

    @Override
    public String toString() {
        return "(" + key + ", " + value + ")";
    }
}

class HtmlDocument {

    private int docId;
    private String URL;
    private List<String> terms;

    public int getDocId() {
        return docId;
    }

    public void setDocId(int docId) {
        this.docId = docId;
    }

    public String getURL() {
        return URL;
    }

    public void setURL(String URL) {
        this.URL = URL;
    }

    public List<String> getTerms() {
        return terms;
    }

    public void setTerms(List<String> terms) {
        this.terms = terms;
    }

    public String getWordFrequency(String word) {
        return "query";
    }
}

class IndexingTask implements Runnable {

    private ConcurrentHashMap<String, Vector<Tuple>> dictionary;
    private HtmlDocument document;

    public IndexingTask(ConcurrentHashMap<String, Vector<Tuple>> dictionary,
                        ConcurrentHashMap<Integer, String> sources, HtmlDocument document) {
        this.dictionary = dictionary;
        this.document = document;
        sources.putIfAbsent(document.getDocId(), document.getURL());
    }

    @Override
    public void run() {

        for (String word : document.getTerms()) {

            this.dictionary.computeIfAbsent(word, k -> new Vector<Tuple>())
                    .add(new Tuple(document.getDocId(), document.getWordFrequency(word)));

        }
        Crawler.RUNNING_TASKS.decrementAndGet();
    }
}

class Crawler {

    protected BlockingQueue<HtmlDocument> finishedPages = new LinkedBlockingQueue<>();

    public static final AtomicInteger RUNNING_TASKS = new AtomicInteger();

    public BlockingQueue<HtmlDocument> getFinishedPages() {
        return finishedPages;
    }
}

public class ConcurrentHashMapExample {

    private ConcurrentHashMap<Integer, String> sources = new ConcurrentHashMap<>();
    private ConcurrentHashMap<String, Vector<Tuple>> dictionary = new ConcurrentHashMap<>();

    private static final ExecutorService INDEXING_SERVICE = Executors.newSingleThreadExecutor();

    public void run(Crawler crawler) throws InterruptedException {
        while (!crawler.getFinishedPages().isEmpty()) {
            Crawler.RUNNING_TASKS.incrementAndGet();
            this.INDEXING_SERVICE.submit(new IndexingTask(this.dictionary, sources,
                    crawler.getFinishedPages().take()));
        }
        //when you call ```this.INDEXING_SERVICE.shutdown()``` may 'IndexingTask' has not run yet
        while (Crawler.RUNNING_TASKS.get() > 0)
            Thread.sleep(3);
        this.INDEXING_SERVICE.shutdown();
    }

    public ConcurrentHashMap<Integer, String> getSources() {
        return sources;
    }

    public ConcurrentHashMap<String, Vector<Tuple>> getDictionary() {
        return dictionary;
    }

    public static void main(String[] args) throws Exception {
        ConcurrentHashMapExample example = new ConcurrentHashMapExample();
        Crawler crawler = new Crawler();
        HtmlDocument document = new HtmlDocument();
        document.setDocId(1);
        document.setURL("http://127.0.0.1/abc");
        document.setTerms(Arrays.asList("hello", "world"));
        crawler.getFinishedPages().add(document);
        example.run(crawler);
        System.out.println("source: " + example.getSources());
        System.out.println("dictionary: " + example.getDictionary());
    }

}

output:输出:

source: {1=http://127.0.0.1/abc}
dictionary: {world=[(1, query)], hello=[(1, query)]}

I think, in your business, you should use the 'Producer', 'Consumer' design pattern我认为,在您的业务中,您应该使用“生产者”、“消费者”设计模式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM