如何在非常大的数据库上使用mongodb查询操作（每个分片有3个大约260-3亿）

Question

I have to find data in between different date ranges column in a sharded database having total of around 800 million documents.我必须在一个共有大约 8 亿个文档的分片数据库中的不同日期范围列之间查找数据。 I am using this query:我正在使用这个查询：

cursordata=event.aggregate([{"$match":{}},{"$unwind":},{"$project":{}}])

However, when I change it to a pandas dataframe但是，当我将其更改为熊猫数据框时

df=pd.DataFrame(cursordata)

It is taking for ever and not working at all, it just got stuck.它永远需要并且根本不起作用，它只是卡住了。

I have 2 choices:我有2个选择：

Either keep doing query for different conditions directly from mongodb or要么继续直接从 mongodb 查询不同的条件，要么
After changing to data to dataframe, perform operation for different conditions改成data到dataframe后，进行不同条件的操作

Please suggest how to proceed.请建议如何进行。

Answer 1

Could we have a sample of documents?我们可以提供一份文件样本吗？ I think you should look for an index matching the fields you're querying.我认为您应该寻找与您查询的字段匹配的索引。

As a reminder, try to keep in mind the Equality, Sort, Range rule in MongoDB indexing.提醒一下，尽量记住 MongoDB 索引中的相等、排序、范围规则。
Besides, since you're in a sharded cluster you might want to have your sharding key in you query, otherwise the mongos will query all the shards (more info here )此外，由于您在分片集群中，您可能希望在查询中使用分片键，否则 mongos 将查询所有分片（更多信息请点击此处）

如何在非常大的数据库上使用mongodb查询操作（每个分片有3个大约260-3亿）

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-06 10:47:37

如何在非常大的数据库上使用mongodb查询操作（每个分片有3个大约260-3亿）

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-06 10:47:37

解决方案1
0 已采纳 2020-03-06 10:47:37