简体   繁体   English

如何在非常大的数据库上使用mongodb查询操作(每个分片有3个大约260-3亿)

[英]How to use mongodb query operation on a very large database (have 3 shards of around 260-300 million in each)

I have to find data in between different date ranges column in a sharded database having total of around 800 million documents.我必须在一个共有大约 8 亿个文档的分片数据库中的不同日期范围列之间查找数据。 I am using this query:我正在使用这个查询:

cursordata=event.aggregate([{"$match":{}},{"$unwind":},{"$project":{}}])

However, when I change it to a pandas dataframe但是,当我将其更改为熊猫数据框时

df=pd.DataFrame(cursordata)

It is taking for ever and not working at all, it just got stuck.它永远需要并且根本不起作用,它只是卡住了。

I have 2 choices:我有2个选择:

  1. Either keep doing query for different conditions directly from mongodb or要么继续直接从 mongodb 查询不同的条件,要么
  2. After changing to data to dataframe, perform operation for different conditions改成data到dataframe后,进行不同条件的操作

Please suggest how to proceed.请建议如何进行。

Could we have a sample of documents?我们可以提供一份文件样本吗? I think you should look for an index matching the fields you're querying.我认为您应该寻找与您查询的字段匹配的索引。

As a reminder, try to keep in mind the Equality, Sort, Range rule in MongoDB indexing.提醒一下,尽量记住 MongoDB 索引中的相等、排序、范围规则。
Besides, since you're in a sharded cluster you might want to have your sharding key in you query, otherwise the mongos will query all the shards (more info here )此外,由于您在分片集群中,您可能希望在查询中使用分片键,否则 mongos 将查询所有分片(更多信息请点击此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为非常大的数据库优化 SPARQL 查询 - Optimize SPARQL query for a very large database 替代非常大的字典(约 4000 万个键) - Alternative to a very large dictionary (~40 million keys) Gensim 与非常大的数据集的相似性(约 470 万) - Gensim Similarity with very large dataset (~4.7 million) 我编写了一个计算collat​​z猜想的程序,它可以运行,但是不能计算非常大的数字,如何解决这个问题? - I have written a program that computes the collatz conjecture, it works but can't compute very large numbers, how do i get around this? 我们如何处理大约 300K 列中存在的大量类别。 在预处理数据? - How we can deal with large number of category present in a column around 300K. in Preprocessing Data? 如何使用HDF存储非常大的矩阵 - How to use HDF to store a very large matrix 如果列表非常大,在 pymongo 中使用 $in 查询的有效方法是什么? - What is efficient way to use $in query in pymongo in case of very large list? 如何使用 lambda 将数组中的每一项相乘,但每一项都有不同的操作 - how can use lambda to multiply every item in array , but each one have diferent operation 使用多个线程在超过3亿行(没有索引)的Mysql表上执行Select查询? - Execute Select query on Mysql table over 300 million rows(hit no index) using multiple threads? 大表的数据库解决方案 - database solution for very large table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM