简体   繁体   English

加快VB.Net 2008中字典的循环

[英]Speed up looping through a dictionary in VB.Net 2008

I have a process that imports a daily file of product registrations, and adds them into our database. 我有一个进程导入每日产品注册文件,并将它们添加到我们的数据库中。 Originally this process would query against the database multiple times for each record to determine how to process the data. 最初,此过程将针对每个记录多次查询数据库,以确定如何处理数据。

In order to speed up this process and prevent any timeout issues that could appear to people trying to use a reporting site that uses the same database, I've changed the code to pull down a few of the tables into dictionaries, and then iterate across them to see if that customer/address/dealership exists, and pull the Id from the key of the dictionary if it does, or insert it into the table and then the dictionary if it doesn't. 为了加快此过程并防止尝试使用同一数据库的报告站点的人可能出现的任何超时问题,我已更改代码以将一些表下拉到词典中,然后迭代他们看看客户/地址/经销商是否存在,如果是,则从字典的键中提取Id,或者将其插入表中,如果不存在则插入字典。

However, I'm currently finding this to be running slower than if I were querying the database multiple times for each registration. 但是,我目前发现这比我每次注册多次查询数据库要 One possible cause that I can think of is that my dictionaries are quite large (one has 8 million entries and another has 11 million). 我能想到的一个可能的原因是我的词典非常庞大(一个有800万条目,另一个有1100万条)。

Here is one example of what I'm doing: 这是我正在做的一个例子:

    For Each kvp As KeyValuePair(Of Int64, String) In dCust
            If kvp.Value = firstName & "|" & lastName & "|" & companyName & "|" & addrId & "|" & typeID & "|" & phone & "|" & email Then
                custId = kvp.Key
                Exit For
            End If
    Next

This dictionary has around 11 million records in it. 这本词典里面有大约1100万条记录。

An idea a coworker of mine had was to run a Dictionary.ContainsValue() before the loop to see if it's even there. 我的一个同事的想法是在循环之前运行Dictionary.ContainsValue()以查看它是否在那里。 And if it isn't skip the loop entirely. 如果它不完全跳过循环。 I'd only want to try this if that runs faster than just doing the loop itself, if they take the same time I don't see a point in basically running the loop twice. 我只想尝试这个,如果它运行得比仅仅执行循环本身更快,如果他们花费相同的时间我没有看到基本上运行循环两次的一点。

So my questions to you are: 所以我的问题是:

  • Am I going about this in the most effiecient way? 我是否以最有效的方式解决这个问题?
  • Would it be faster to run a Dictionary.ContainsValue() before attempting the loop, or will the system interpret them as the same thing, thus doubling my time? 在尝试循环之前运行Dictionary.ContainsValue()会更快,还是系统将它们解释为相同的东西,从而使我的时间加倍?
  • Is there anything else that I should be looking for? 还有什么我应该找的吗?

One obvious small optimization would be to perform the concatenation of firstName , lastName etc once outside the loop. 一个明显的小优化将是执行的级联firstNamelastName外循环一次等。 Currently you're concatenating on every iteration of the loop, which is obviously slower than it might be. 目前,你在循环的每次迭代中连接,这显然比它可能更慢。

No, using ContainsValue would be no faster - that still has to do a linear search. 不,使用ContainsValue并不会更快 - 仍然需要进行线性搜索。

The obvious big optimization would be to invert the dictionary - create a Dictionary(Of String, Int64) which basically has the ID for each string value. 显而易见的优化是反转字典 - 创建一个Dictionary(Of String, Int64) ,它基本上具有每个字符串值的ID。 Currently you're not using the natural benefits of a dictionary - you're essentially treating it like a list of key/value pairs. 目前,您没有使用字典的自然优势 - 您实际上将其视为键/值对列表。

Do you actually use the dictionary the normal way as well (looking up by the key)? 你是否也以正常方式使用字典(按键查找)?

It looks like you're using the Dictionary in the opposite way to how it should be used - or am I missing something? 看起来你正在使用字典与它应该如何使用 - 或者我错过了什么?

By iterating over the key, value pairs in the dictionary, you're nullifying the benefit that a dictionary (hashtable) provides - that of quick lookup of a given key value. 通过迭代字典中的键值对,您将使字典(散列表)提供的好处无效 - 快速查找给定键值的好处。

You should use a dictionary of (String, Int64), mapping the firatname, lastname, ... to the custId. 您应该使用(String,Int64)字典,将firatname,lastname,...映射到custId。 A lookup into this would be very quick compared to what you're currently doing. 与您目前正在进行的操作相比,查找此内容会非常快。

One thing you could do to speed things up is to pre-concat that search string: 你可以做的一件事就是加速搜索字符串:

Dim SearchValue as String = firstName & "|" & lastName & "|" & companyName & "|" & addrId & "|" & typeID & "|" & phone & "|" & email
For Each kvp As KeyValuePair(Of Int64, String) In dCust
        If kvp.Value = SearchValue Then
            custId = kvp.Key
            Exit For
        End If
Next

The point of using a dictionary is to do a quick lookup of the KEY not value. 使用字典的目的是快速查找KEY而不是值。 Either just use a normal array list or change your code so that you're doing a key lookup rather than a value lookup. 要么只使用正常的数组列表,要么更改代码,以便进行键查找而不是值查找。

I think the answers about the dictionary are great, but I think the broader answer is to handle this stuff at the database tier and not download millions of records to iterate through using a dictionary in C#. 我认为关于字典的答案很好,但我认为更广泛的答案是在数据库层处理这些东西而不是下载数百万条记录来迭代使用C#中的字典。 Why not use a table valued parameter (I'm assuming you're using SQL Server 2008) to pass in the data you want to compare and see if it exists? 为什么不使用表值参数 (我假设您使用的是SQL Server 2008)来传递您想要比较的数据并查看它是否存在? You'd pass it to a stored proc or something that would do the comparison all on the SQL side. 你将它传递给存储过程或者在SQL端进行比较的东西。 You could even do something like: 你甚至可以这样做:

INSERT ProductRegistrations
SELECT * FROM @tvpProductsToAdd pa WHERE
pa.firstName + pa.lastName + pa.companyName NOT IN
(SELECT firstName + lastName + companyName FROM ProductRegistrations)

@tvpProductsToAdd is the table valued parameter you'd pass in with your new products. @tvpProductsToAdd是您使用新产品传递的表值参数。 You might want to create some sort of index on those fields to speed up the comparison, given that you don't seem to have keys that you can compare. 您可能希望在这些字段上创建某种索引以加快比较速度,因为您似乎没有可以比较的键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM