[英]How to use mongodb query operation on a very large database (have 3 shards of around 260-300 million in each)
I have to find data in between different date ranges column in a sharded database having total of around 800 million documents.我必须在一个共有大约 8 亿个文档的分片数据库中的不同日期范围列之间查找数据。 I am using this query:
我正在使用这个查询:
cursordata=event.aggregate([{"$match":{}},{"$unwind":},{"$project":{}}])
However, when I change it to a pandas dataframe但是,当我将其更改为熊猫数据框时
df=pd.DataFrame(cursordata)
It is taking for ever and not working at all, it just got stuck.它永远需要并且根本不起作用,它只是卡住了。
I have 2 choices:我有2个选择:
Please suggest how to proceed.请建议如何进行。
Could we have a sample of documents?我们可以提供一份文件样本吗? I think you should look for an index matching the fields you're querying.
我认为您应该寻找与您查询的字段匹配的索引。
As a reminder, try to keep in mind the Equality, Sort, Range rule in MongoDB indexing.提醒一下,尽量记住 MongoDB 索引中的相等、排序、范围规则。
Besides, since you're in a sharded cluster you might want to have your sharding key in you query, otherwise the mongos will query all the shards (more info here )此外,由于您在分片集群中,您可能希望在查询中使用分片键,否则 mongos 将查询所有分片(更多信息请点击此处)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.