简体   繁体   English

通过并行加速mongo查询并使用ThreadPool?

[英]Speed up mongo queries by parallel them and use a ThreadPool?

Our mongodb architecture stores data weekly. 我们的mongodb架构每周存储数据。 Every week has it's own db with the same collection set. 每周都有它自己的数据库与相同的集合集。 Sometimes I have to check data up to over 12 weeks that means I run the same query over 12 different databases (all on one mongo server): 有时我必须检查数据超过12周,这意味着我在12个不同的数据库(所有在一个mongo服务器上)运行相同的查询:

...
for (MongoOperationDto week : allWeeks) {
  results.addAll(repo.find(gid, week.db(), week.collection());
}
...

In this case I run sequentially 12 time find(). 在这种情况下,我按顺序运行12次find()。 I guess the internal connection pool handle them or? 我猜内部连接池处理它们还是? If not would it be a benefit if I create 12 Java threads and every thread would run one find? 如果不是,如果我创建12个Java线程并且每个线程将运行一个查找,那么它会是一个好处吗? Maybe like: 也许喜欢:

public class FindTask {

    @Autowired
    MyMongoRepo repo;

    @Async
    public List<Result> doFindTask(long gid, MongoOperationDto week) {
         return repo.find(gid, week.db(), week.connection());
    }
}

Which approach is actually faster or is there no speed difference in retrieving the data? 哪种方法实际上更快或者在检索数据时没有速度差异?

The connection pool handle the connections, nothing more: 连接池处理连接,仅此而已:

In software engineering, a connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required 在软件工程中,连接池是维护的数据库连接的缓存,以便在将来需要对数据库的请求时可以重用连接

For your first code, It means that after the first find has been finished instead of establishing a new connection to MongoDB it can reuse an existing already opened and not used connection present in the pool. 对于您的第一个代码,这意味着在完成第一个查找而不是建立与MongoDB的新连接之后,它可以重用池中存在的现有已打开且未使用的连接。

So in the first case, you will have 12 serial queries and 1 connection used for each query. 因此,在第一种情况下,每个查询将有12个串行查询和1个连接。

In the second case, you have 12 parallel queries using at the same time 12 different connections. 在第二种情况下,您有12个并行查询同时使用12个不同的连接。

In terms of performances if the queries need a long time the second solution should be faster (time to complete), but it uses more resources (ram, cpu time). 在性能方面如果查询需要很长时间,第二个解决方案应该更快(完成时间),但它使用更多资源(ram,cpu时间)。 Note that the time is also influenced by your MongoDB architecture. 请注意,时间也受MongoDB架构的影响。 If your queries operate with long disk operations on the same disk probably parallelizing them don't improve too much the total time. 如果您的查询在同一磁盘上运行长磁盘操作,则可能并行化它们并不会改善总时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM