简体   繁体   中英

Help with C#.NET generic collections performance and optimization

I am trying to optimize a piece of .NET 2.0 C# code that looks like this:

Dictionary<myType, string> myDictionary = new Dictionary<myType, string>();
// some other stuff
// inside a loop check if key is there and if not add element
if(!myDictionary.ContainsKey(currentKey))
{
   myDictionary.Add(currentKey, "");
}

Looks like the Dictionary has been used by whoever wrote this piece of code even if not needed (only the key is being used to store a list of unique values) because faster than a List of myType objects for search. This seems obviously wrong as only the key of the dictionary but I am trying to understand what's the best way to fix it.

Questions:

1) I seem to understand I would get a good performance boost even just using .NET 3.5 HashSet. Is this correct?

2) What would be the best way to optimize the code above in .NET 2.0 and why?

EDIT : This is existing code I am trying to optimize, it's looping through dozens of thousands items and for each one of them is calling a ContainsKey. There's gotta be a better way of doing it (even in .NET 2.0)! :)

I think you need to break this down into 2 questions

Is Dictionary<myType,string> the best available type for this scenario

No. Based on your breakdown, HashSet<myType> is clearly the better choice because it's usage pattern more accurately fits the scenario

Will switching to Hashset<myType> give me a performance boost?

This is really subjective and only a profiler can give you the answer to this question. Likely you'll see a very minor memory size improvement per element in the collection. But in terms of raw computing power I doubt you'll see a huge difference. Only a profiler can tell you if there is one.

Before you ever make a performance related change to your code remember the golden rule.

Don't make any performance related changes until a profiler has told you precisely what is wrong with your code.

Making changes which violate this rule are just guesses. A profiler is the only way to measure success of a performance fix.

1) No. A dictionary does a hash on the key so your lookup should be O(1). A Hashset should result in less memory needed though. But honestly, it isn't that much that you will really see a performance boost.

2) Give us some more detail as to what you are trying to accomplish. The code you posted is pretty simple. Have you measured yet? Are you seeing that this method is slow? Don't forget "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth

  1. Depending on the size of your keys, you may actually see performance degrade.

  2. One way in 2.0 would be to try and insert it and catch the exception (of course, this depends on how many duplicate keys you plan on having:

 foreach(string key in keysToAdd) { try { dictionary.Add(key, "myvalue"); } catch(ArgumentException) { // do something about extra key } } 

The obvious mistake (if we discuss performance) I can see is the double work done when calling ContainsKey and then adding the key-value pair. When the pair is added using Add method, the key is again internally checked for presense. The whole if block can be safely replaced by this:

... myDictionary[currentKey] = ""; ...

If the key already exists there, the value will be just replaces and no exception will get thrown. Moreover, if the value is not used at all I would personally use null values to fill it. Can see no reason for using any string constant there.

The possible performance degrade mentioned by scottm is not for doing simple lookups. It is for calculating the intersection between 2 sets. HashSet does have slightly faster lookups than Dictionary. The performance difference really is going to be very small, though, as everyone says -- the lookup takes most of the time & creating the KeyValuePair takes very little.

For 2.0, you could make the "Value" object one of these:

public struct Empty {}

It may do slightly better than the "".

Or you could try making a reference to System.Core.dll in your 2.0 project, so you can use the HashSet.

Also, make sure that GetHashCode and Equals are as efficient as possible for MyType. I've been bitten by using a dictionary on something with a really slow GetHashCode (I believe we tried to use a delegate as a key or something like that.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM