简体   繁体   English

循环检查数据库中数据是否存在的最佳实践?

[英]Best practice for checking data existence in database in a loop?

I need to check that a specific data exists in table1 in database or not within a for loop. 我需要检查数据库中的table1中是否存在特定数据,或者是否在for循环中。 If it exists then no action and for loop continues, otherwise I should add data to table1. 如果存在,则不执行任何操作,并且for循环继续,否则,我应该将数据添加到table1中。

So, in every iteration, I take a look at database. 因此,在每次迭代中,我都会看一下数据库。 I Believe that it's time-consuming. 我相信这很耗时。

Is there any best practice for doing such these tasks? 有没有执行这些任务的最佳实践?

How do you verify existence of a record in your database table? 如何验证数据库表中是否存在记录? Most likely you match it against a local Id or something. 您很可能将其与本地ID或其他内容匹配。

If this is true, then I'd query the Table and select all Id's, storing them in a Hashtable ( Dictionary in .Net). 如果是这样,那么我将查询表并选择所有ID,然后将它们存储在Hashtable(.Net中的Dictionary )中。 (This might not be practical if your database contains millions of records). (如果您的数据库包含数百万条记录,这可能不切实际)。 Determining whether a record in the table exists now is a simple matter of checking if a key in the Dictionary exists, which is a O(log n) operation and so a lot better than O(n) expensive database roundtrips. 现在确定表中的记录是否存在是一个简单的问题,即检查字典中的键是否存在,这是一个O(log n)操作,因此比O(n)昂贵的数据库往返更好。

The next thing to think about is how to remember the records you need to add to the table. 接下来要考虑的是如何记住需要添加到表中的记录。 This depends on whether you may have duplicate records locally that you want to check if they should be added or if they are guaranteed not to contain (local) duplicates. 这取决于您是否想在本地检查是否有重复记录,是否应该添加它们,或者保证它们不包含(本地)重复项。

In the simple case where there are no possible duplicates, just adding them to the Dictionary at the appropriate key and then later querying Dictionary.Values which is O(1) is probably as fast as it gets. 在没有可能重复的简单情况下,只需将它们添加到适当键上的Dictionary中,然后再查询Dictionary.Values即O(1))可能会尽快。 If you need the inserts to be really fast because they are massive, consider using SQL Bulk Inserts. 如果您因为插入量很大而需要快速插入,请考虑使用SQL批量插入。

If your table is too large to cache the Id's locally, I'd consider implementing a stored procedure for doing the insert and have the logic that decides whether to actually perform an insert or just do nothing there. 如果您的表太大而无法在本地缓存ID,则可以考虑实现一个存储过程来执行插入操作,并具有决定是实际执行插入操作还是仅执行插入操作的逻辑。 This will get rid of the second roundtrip, which is usually pretty expensive. 这将摆脱通常非常昂贵的第二次往返。

If your RDBMS implements the SQL Merge command (assuming your using MS SQL Server, it does), I'd insert all data in a temporary table and then Merge it with the target table. 如果您的RDBMS实现了SQL Merge命令(假设您使用的是MS SQL Server,则可以),我会将所有数据插入临时表中,然后将其与目标表合并。 This is probably the fastest solution. 这可能是最快的解决方案。

How much data, and what SQL implementation can make a big difference here... 多少数据以及什么SQL实现可以在这里发挥很大的作用...

For example, with 10 million rows of data, making 10 million (potentially logged) operations, one for each row will take orders of magnitudes longer a than for example: 例如,对于一千万行数据,执行一千万(潜在地记录)的操作,每一行将比以下示例多花费几个数量级:

  • uploading the same data to a temporary table in a bulk-operation eg through bulk-copy API if you're using SQL. 如果您使用的是SQL,则通过批量复制API将相同数据上传到临时表中。
  • performing a left-outer-join to diff the data 执行左外连接以区分数据
  • insert the difference in a single batch operation. 在一个批处理操作中插入差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM