用Java多线程回调

Question

I'm trying to multi thread an import job, but running into a problem where it's causing duplicate data. 我正在尝试对导入作业进行多线程处理，但是遇到了导致重复数据的问题。 I need to keep my map outside of the loop so all my threads can update and read from it, but I can't do this without it being final and with it being final I can't update the map. 我需要将地图保留在循环之外，以便我所有的线程都可以更新并从中读取信息，但是如果没有将其定为最终版本并且将其定为最终版本，就无法执行更新操作。 Currently I need to put my Map object in the run method, but the problem comes when the values are not initially in the database and each thread creates a new one. 当前，我需要将Map对象放在run方法中，但是问题出在值最初不在数据库中并且每个线程都创建一个新值时。 This results in duplicate data in the database. 这将导致数据库中的数据重复。 Does anybody know how to do some sort of call back to update my map outside? 有人知道如何做一些回电以更新我的外部地图吗？

ExecutorService executorService = Executors.newFixedThreadPool(10);

final Map<Integer, Object> map = new HashMap<>();
map.putAll(populate from database);
for (int i = 0; i < 10; i++) {

    executorService.execute(new Runnable() {
        public void run() {

        while ((line = br.readLine()) != null) {
            if(map.containsKey(123)) {
                //read map object
                session.update(object);                
            } else {
                map.put(123,someObject);
                session.save(object);
            }            

            if(rowCount % 250 == 0)
            tx.commit;
        });

}

executorService.shutdown();

Answer 1

You need to use some synchronization techniques. 您需要使用一些同步技术。

Problematic part is when different threads are trying to put some data into map. 问题部分是当不同的线程试图将某些数据放入映射中时。

Example: 例：

Thread 1 is checking if there is object with key 123 in map. 线程1正在检查地图中是否存在带有键123的对象。 Before thread 1 added new object to map, thread 2 is executed. 在线程1将新对象添加到映射之前，执行了线程2。 Thread 2 also check if there is object with key 123. Then both threads added object 123 to map. 线程2还检查是否存在带有键123的对象。然后，两个线程都将对象123添加到了映射中。 This causes duplicates... 这会导致重复...

You can read more about synchronization here 您可以在此处阅读有关同步的更多信息

http://docs.oracle.com/javase/tutorial/essential/concurrency/sync.html http://docs.oracle.com/javase/tutorial/essential/concurrency/sync.html

Answer 2

Based on your problem description it appears that you want to have a map where the data is consistent and you always have the latest up-t-date data without having missed any updates. 根据问题描述，您似乎希望拥有一个数据一致的地图，并且始终拥有最新的数据，而不会错过任何更新。

In this case make you map as a Collections.synchronizedMap() . 在这种情况下，使您映射为Collections.synchronizedMap() 。 This will ensure that all read and write updates to the map are synchronized and hence you are guaranteed to find a key using the latest data in the map and also guaranteed to write exclusively to the map. 这样可以确保对地图的所有读取和写入更新都是同步的，因此可以确保您使用地图中的最新数据查找密钥，并且可以保证仅写入地图。

Refer to this SO discussion for a difference between the concurrency techniques used with maps. 请参阅本 SO讨论与地图使用的并发技术之间的差异。

Also, one more thing - defining a Map as final does not mean yu cannot modify the map - you can definitely add and remove elements from the map. 另外，还有一件事-将Map定义为final 并不意味着yu无法修改地图-您绝对可以从地图中添加和删除元素。 What you cannot do however is change the variable to point to another map. 但是，您不能做的就是将变量更改为指向另一个地图。 This is illustrated by a simple code snippet below: 下面的简单代码段说明了这一点：

    private final Map<Integer, String> testMap = Collections.synchronizedMap(new HashMap<Integer,String>());
    testMap.add(1,"Tom"); //OK
    testMap.remove(1);   //OK
    testMap = new HashMap<Integer,String>(); //ERROR!! Cannot modify a variable with the final modifier

Answer 3

I would suggest the following solution 我建议以下解决方案

Use ConcurrentHashmap 使用ConcurrentHashmap
Don't use update and commit inside your crawling threads 不要在爬网线程中使用update和commit
Trigger save and commit when your map reaches a critical size in a separate thread. 当地图在单独的线程中达到临界大小时，触发save并commit 。

Pseudocode sample: 伪代码示例：

final Object lock = new Object();

...

executorService.execute(new Runnable() {
    public void run() {
        ...
        synchronized(lock){
            if(concurrentMap.size() > 250){
               saveInASeparateThread(concurrentMap.values().removeAll()));          
            }
        }
    }
}

Answer 4

This following logic resolves my issue. 以下逻辑解决了我的问题。 The code below isn't tested. 以下代码未经测试。

ExecutorService executorService = Executors.newFixedThreadPool(10);

final Map<Integer, Object> map = new ConcurrentHashMap<>();
map.putAll(myObjectList);

List<Future> futures = new ArrayList<>();

for (int i = 0; i < 10; i++) {
    final thread = i;

    Future future = executorService.submit(new Callable() {
        public void call() {

        List<MyObject> list;

        CSVReader reader = new CSVReader(new InputStreamReader(csvFile.getStream()));

        list = bean.parse(strategy, reader);

        int listSize = list.size();
        int rowCount = 0;

        for(MyObject myObject : list) {

            rowCount++;

            Integer key = myObject.getId();

            if(map.putIfAbsent(key, myObject) == null) {
                session.save(object);                
            } else {
                myObject = map.get(key);
                //Do something
                session.update(myObject);
            }            

            if(rowCount % 250 == 0 || rowCount == listSize) {
                tx.flush();
                tx.clear();
            }
        };

        tx.commit();

        return "Thread " + thread + " completed."; 

    });  

    futures.add(future);  
}

for(Future future : futures) {
    System.out.println(future.get());
}

executorService.shutdown();

用Java多线程回调

问题描述

4 个解决方案

解决方案1
1 2013-10-28 17:57:01

解决方案2
1 2013-10-28 18:20:17

解决方案3
1 2013-10-28 18:42:07

解决方案4
1 已采纳 2013-11-13 19:46:02

用Java多线程回调

问题描述

4 个解决方案

解决方案1 1 2013-10-28 17:57:01

解决方案2 1 2013-10-28 18:20:17

解决方案3 1 2013-10-28 18:42:07

解决方案4 1 已采纳 2013-11-13 19:46:02

解决方案1
1 2013-10-28 17:57:01

解决方案2
1 2013-10-28 18:20:17

解决方案3
1 2013-10-28 18:42:07

解决方案4
1 已采纳 2013-11-13 19:46:02