简体   繁体   English

添加新数据、更新现有数据和删除缺失数据的算法

[英]Algorithm to add new, update existing and remove missing data

I have done lots of searching, but been unable to find a satisfactory answer to the most efficient approach to achieve the following.我进行了大量搜索,但无法找到满意的答案来实现以下目标的最有效方法。

Say my App contains a list of Products.假设我的应用程序包含一个产品列表。 At the end of every day an external service is called that returns another list of Products from a master data source.在每天结束时,调用外部服务从主数据源返回另一个产品列表。

  • If the list of Products from master data contains any Products not in my App, add the Product to the App.如果主数据中的产品列表包含任何不在我的应用程序中的产品,请将产品添加到应用程序。
  • If the Product in the master data is already in my App, and no changes have been made, do nothing.如果主数据中的 Product 已经在我的 App 中,并且没有进行任何更改,则什么都不做。
  • If the Product in the master data is already in my App, but some data has changed (the Product's name for instance), update the Product.如果主数据中的产品已经在我的应用程序中,但某些数据已更改(例如产品名称),请更新产品。
  • If a Product is available in my App, but no longer in the master data source, flag it as "Unavailable" in the App.如果某个产品在我的应用程序中可用,但不再位于主数据源中,请在应用程序中将其标记为“不可用”。

At the moment, I do a loop on each list, looping through the other list for each Product:目前,我在每个列表上循环,为每个产品循环另一个列表:

  • For each Product in the master data list, loop through the Products in the App, and update as needed.对于主数据列表中的每个产品,循环遍历 App 中的产品,并根据需要进行更新。 If no Product was found, then add the Product to the App.如果未找到产品,则将产品添加到应用程序。
  • Then, for each Product in the App, loop through the Products in the master data list, and if not found, flag as "Unavailable" in the App.然后,对于应用程序中的每个产品,循环访问主数据列表中的产品,如果未找到,则在应用程序中标记为“不可用”。

I'm wondering if there is a more efficient method to achieve this?我想知道是否有更有效的方法来实现这一目标? Or any algorithms or patterns that are relevant here?或者任何与此处相关的算法或模式?

In each case the Products are represented by objects in a Python list.在每种情况下,产品都由 Python 列表中的对象表示。

First of all I'd suggest to use dict s with the Product code (or name or whatever) as key and the Product object as value.首先,我建议使用dict以 Product 代码(或名称或其他)作为键,将 Product object作为值。 This should make your loops faster by at least a 100x factor on a thousand entries.这应该使您的循环速度在一千个条目上至少提高 100 倍。

Then especially for the second search it may be worth exploring the possibility of converting the keys of the first dict to a set and looping on the difference as in然后特别是对于第二次搜索,可能值得探索将第一个dict的键转换为set并循环差异的可能性,如

for i in set(appDict.keys()).difference(masterDict.keys()):
    ##update unavailable Product data

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM