[英]Best way to compare changes in millions of mongoDB records
I am working on a project where I store dns records of millions of websites and I need to monitor and update changes in these data periodically.我正在从事一个项目,在该项目中我存储了数百万个网站的 dns 记录,我需要定期监视和更新这些数据的变化。 The data is stored on a mongodb as follows数据存储在mongodb上如下
{
domain: "www.google.com",
"IP": [
{
"value":"216.58.198.78",
"first_seen":"2020-02-01 00:00:00",
"last_seen":"2020-02-10 00:00:00"
},
{
"value":"216.58.198.75",
"first_seen":"2020-02-11 00:00:00",
"last_seen":"2020-02-25 00:00:00"
},
...
]
}
I run periodic scans to get new domains and fresh DNS records and I would like to know the best way to compare it with data stored in DB and update it.我运行定期扫描以获取新域和新的 DNS 记录,我想知道将它与存储在数据库中的数据进行比较并更新它的最佳方法。
What I am thinking, is to do the following.我在想的是做以下事情。
This sounds terrible in performance and memory consumption (we are storing millions of records in memory) but I am not sure if other alternatives (query then update) would do any better (cuz we'd need to perform millions of transactions)这在性能和内存消耗方面听起来很糟糕(我们在内存中存储了数百万条记录),但我不确定其他替代方案(查询然后更新)是否会做得更好(因为我们需要执行数百万个事务)
I would appreciate if you can provide some insights on the best way to achieve this or guide me to areas of research that might help.如果您能就实现这一目标的最佳方式提供一些见解或指导我进行可能有帮助的研究领域,我将不胜感激。
Thanks谢谢
The normal practice is to add a data field (eg "NeedUpdate") on the database table.通常的做法是在数据库表上添加一个数据字段(例如“NeedUpdate”)。
On creating a new record, the "NeedUpdate" will be "ON" for that record创建新记录时,该记录的“NeedUpdate”将为“ON”
Upon updating an existing record, the "NeedUpdate" will be set as "ON" too更新现有记录后,“NeedUpdate”也将设置为“ON”
After that, you can run a cron job (or any period scans) to process the records with "NeedUpdate"="ON" (and after processing, set the "NeedUpdate=''".之后,您可以运行 cron 作业(或任何周期扫描)来处理“NeedUpdate”="ON" 的记录(处理后,设置“NeedUpdate=''”。
In that case the system only needs to process the records which require update.在这种情况下,系统只需要处理需要更新的记录。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.