如何并行而不是顺序执行多个查询？

Question

I am querying all my 10 tables to get the user id from them and loading all the user id's into HashSet so that I can have unique user id. 我要查询所有10个表以从中获取用户ID，并将所有用户ID加载到HashSet中，以便可以拥有唯一的用户ID。

As of now it is sequentially. 截至目前，它是按顺序进行的。 We go to one table and extract all the user_id from it and load it in hash set and then second and third table and keep going. 我们转到一个表，并从中提取所有user_id并将其加载到哈希集中，然后将第二个和第三个表继续进行下去。

    private Set<String> getRandomUsers() {
        Set<String> userList = new HashSet<String>();

        // is there any way to make this parallel?
        for (int table = 0; table < 10; table++) {
            String sql = "select * from testkeyspace.test_table_" + table + ";";

            try {
                SimpleStatement query = new SimpleStatement(sql);
                query.setConsistencyLevel(ConsistencyLevel.QUORUM);
                ResultSet res = session.execute(query);

                Iterator<Row> rows = res.iterator();
                while (rows.hasNext()) {
                    Row r = rows.next();

                    String user_id = r.getString("user_id");
                    userList.add(user_id);
                }
            } catch (Exception e) {
                System.out.println("error= " + ExceptionUtils.getStackTrace(e));
            }
        }

        return userList;
    }

Is there any way to make this multithreaded so that for each table they get the data from my table in parallel? 有什么方法可以使这个多线程的，以便他们为每个表并行地从我的表中获取数据？ At the end, I need userList hashset which should have all the unique user id from all the 10 tables. 最后，我需要userList哈希集，该哈希集应具有所有10个表中的所有唯一用户ID。

I am working with Cassandra database and connection is made only once so I don't need to create multiple connections. 我正在使用Cassandra数据库，并且连接仅建立一次，因此不需要创建多个连接。

Answer 1

If you're able to use Java 8, you could probably do this using parallelStream against a list of the tables, and use a lambda to expand the table name into the corresponding list of unique IDs per table, then join the results together into a single hash. 如果您能够使用Java 8，则可以对表列表使用parallelStream进行此操作，并使用lambda将表名扩展为每个表的唯一ID对应列表，然后将结果结合在一起单哈希。

Without Java 8, I'd use Google Guava's listenable futures and an executor service something like this: 没有Java 8，我将使用Google Guava的可监听期货和类似以下内容的执行服务：

public static Set<String> fetchFromTable(int table) {
    String sql = "select * from testkeyspace.test_table_" + table + ";";
    Set<String> result = new HashSet<String>();
    // populate result with your SQL statements
    // ...
    return result;
}

public static Set<String> fetchFromAllTables() throws InterruptedException, ExecutionException {
    // Create a ListeningExecutorService (Guava) by wrapping a 
    // normal ExecutorService (Java) 
    ListeningExecutorService executor = 
            MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());

    List<ListenableFuture<Set<String>>> list = 
            new ArrayList<ListenableFuture<Set<String>>>(); 
    // For each table, create an independent thread that will 
    // query just that table and return a set of user IDs from it
    for (int i = 0; i < 10; i++) {
        final int table = i;
        ListenableFuture<Set<String>> future = executor.submit(new Callable<Set<String>>() {
            public Set<String> call() throws Exception {
                return fetchFromTable(table);
            }
        });
        // Add the future to the list
        list.add(future);
    }
    // We want to know when ALL the threads have completed, 
    // so we use a Guava function to turn a list of ListenableFutures
    // into a single ListenableFuture
    ListenableFuture<List<Set<String>>> combinedFutures = Futures.allAsList(list);

    // The get on the combined ListenableFuture will now block until 
    // ALL the individual threads have completed work.
    List<Set<String>> tableSets = combinedFutures.get();

    // Now all we have to do is combine the individual sets into a
    // single result
    Set<String> userList = new HashSet<String>();
    for (Set<String> tableSet: tableSets) {
        userList.addAll(tableSet);
    }

    return userList;
}

The use of Executors and Futures is all core Java. Executors和Futures的使用都是Java的核心。 The only thing Guava does is let me turn Futures into ListenableFutures. 番石榴唯一要做的就是让我将Future变成ListenableFutures。 See here for a discussion of why the latter is better. 请参阅此处以讨论为何后者更好。

There are probably still ways to improve the parallelism of this approach, but if the bulk of your time is being spent in waiting for the DB to respond or in processing network traffic, then this approach may help. 可能仍有改善这种方法并行性的方法，但是如果您花费大量时间等待数据库响应或处理网络流量，则此方法可能会有所帮助。

Answer 2

You may be able to make it multithreaded but with the overhead of thread creation and multiple connections, you probably won't have significant benefit. 您可能可以使其成为多线程的，但是由于线程创建和多个连接的开销，您可能不会获得显着的收益。 Instead, use a UNION statement in mysql and get them all at once. 相反，请在mysql中使用UNION语句，并立即获取它们。 Let the database engine figure out how to get them all efficiently: 让数据库引擎弄清楚如何有效地获取它们：

String sql = "select user_id from testkeyspace.test_table_1 UNION select  user_id from testkeyspace.test_table_2 UNION select user_id from testkeyspace.test_table_3 ...."

Of course, you'll have to programatically create the sql query string. 当然，您必须以编程方式创建sql查询字符串。 Don't actually put "...." in your query. 请勿在查询中实际输入“ ....”。

如何并行而不是顺序执行多个查询？

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-02-28 00:52:43

解决方案2
0 2015-02-28 00:39:49

如何并行而不是顺序执行多个查询？

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-02-28 00:52:43

解决方案2 0 2015-02-28 00:39:49

解决方案1
2 已采纳 2015-02-28 00:52:43

解决方案2
0 2015-02-28 00:39:49