简体   繁体   中英

C#: Add strings to an array if the array does not contain the string

I have a lot of string arrays. From all these string arrays, I want to create an array of unique strings. At the moment I do it like this:

string[] strings = {};

while(running)
{
   newStringArrayToAdd[] = GetStrings();
   strings = strings.Concat(newStringArrayToAdd).ToArray();
}

uniqueStrings = strings.Distinct.ToArray();

This works but it is very very slow since I have to keep the strings variable in memory which gets very huge. Therefore I'm looking for a way to check on the fly if a string is in uniqueStrings and if not add it immediately. How can I do that?

Consider using a HashSet<string> instead of an array. It will do nothing if the string already exists in the set:

HashSet<string> strings = new HashSet<string>();

strings.Add("foo");
strings.Add("foo");

strings.Count // 1

The UnionWith method will be very useful in your example code:

HashSet<string> strings = new HashSet<string>();

while(running)
{
   string[] newStringArrayToAdd = GetStrings();
   strings.UnionWith(newStringArrayToAdd);
}

Use a HashSet instead. Like this:

HashSet<string> uniqueStrings = new HashSet<string>();

for loop...
if (!uniqueStrings.Contains(newStringToAdd))
{
  uniqueStrings.Add(newStringToAdd);
}

To get the array afterwards:

var uniqueStringArray = uniqueStrings.ToArray();

Can you keep a list of the hashes of the strings?

When adding a new string, if its hash is not already in the list, you know its unique.

If the hash is present it could be unique, or it might be a hash-collision, so you have to check the long way. But that should be a relatively rare case.

You can use Union

string[] result = strings.Union(strings2).ToArray();

Thus change your code as

string[] strings = {};

while(running)
{
   newStringArrayToAdd[] = GetStrings();
   strings = strings.Union(newStringArrayToAdd).ToArray();
}

// No need for this line as strings will be unique
//uniqueStrings = strings.Distinct.ToArray();

As per msdn

This method excludes duplicates from the return set. This is different behavior to the Concat method, which returns all the elements in the input sequences including duplicates.

You can use a Hashset that will do the filtering for you:

HashSet<string> strings = new HashSet<string>();

for loop....
  foreach (string s in newStringArrayToAdd)
    strings.Add(s);
end of loop:

uniqueStrings = strings.ToArray();

use a HashSet<string> . Here is more information on how lists work: http://msdn.microsoft.com/en-us/library/bb359438.aspx

Have you considered storing the strings in a HashSet rather than an array? The hashset will guarantee uniqueness in constant time every time you add to it.

var strings = new HashSet<string>();
strings.Add("abc");
strings.Add("abc");
strings.Count // is 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM