简体   繁体   English

C#:如果数组不包含字符串,则将字符串添加到数组

[英]C#: Add strings to an array if the array does not contain the string

I have a lot of string arrays. 我有很多字符串数组。 From all these string arrays, I want to create an array of unique strings. 我想从所有这些字符串数组中创建一个唯一字符串数组。 At the moment I do it like this: 目前,我这样做是这样的:

string[] strings = {};

while(running)
{
   newStringArrayToAdd[] = GetStrings();
   strings = strings.Concat(newStringArrayToAdd).ToArray();
}

uniqueStrings = strings.Distinct.ToArray();

This works but it is very very slow since I have to keep the strings variable in memory which gets very huge. 这可以工作,但是它非常慢,因为我必须将字符串变量保留在内存中,这会变得非常巨大。 Therefore I'm looking for a way to check on the fly if a string is in uniqueStrings and if not add it immediately. 因此,我正在寻找一种动态检查字符串是否在uniqueStrings中的方法,以及是否不立即添加它的方法。 How can I do that? 我怎样才能做到这一点?

Consider using a HashSet<string> instead of an array. 考虑使用HashSet<string>而不是数组。 It will do nothing if the string already exists in the set: 如果该字符串已存在于集合中,它将不执行任何操作:

HashSet<string> strings = new HashSet<string>();

strings.Add("foo");
strings.Add("foo");

strings.Count // 1

The UnionWith method will be very useful in your example code: UnionWith方法在您的示例代码中将非常有用:

HashSet<string> strings = new HashSet<string>();

while(running)
{
   string[] newStringArrayToAdd = GetStrings();
   strings.UnionWith(newStringArrayToAdd);
}

Use a HashSet instead. 请改用HashSet Like this: 像这样:

HashSet<string> uniqueStrings = new HashSet<string>();

for loop...
if (!uniqueStrings.Contains(newStringToAdd))
{
  uniqueStrings.Add(newStringToAdd);
}

To get the array afterwards: 之后获取数组:

var uniqueStringArray = uniqueStrings.ToArray();

Can you keep a list of the hashes of the strings? 您可以保留字符串哈希值的列表吗?

When adding a new string, if its hash is not already in the list, you know its unique. 添加新字符串时,如果列表中尚未包含其哈希值,则说明它是唯一的。

If the hash is present it could be unique, or it might be a hash-collision, so you have to check the long way. 如果存在哈希,它可能是唯一的,或者可能是哈希冲突,因此您必须检查很长的路要走。 But that should be a relatively rare case. 但这应该是一个相对罕见的情况。

You can use Union 您可以使用Union

string[] result = strings.Union(strings2).ToArray();

Thus change your code as 因此将您的代码更改为

string[] strings = {};

while(running)
{
   newStringArrayToAdd[] = GetStrings();
   strings = strings.Union(newStringArrayToAdd).ToArray();
}

// No need for this line as strings will be unique
//uniqueStrings = strings.Distinct.ToArray();

As per msdn 根据msdn

This method excludes duplicates from the return set. 此方法从返回集中排除重复项。 This is different behavior to the Concat method, which returns all the elements in the input sequences including duplicates. 这与Concat方法的行为不同,后者会返回输入序列中的所有元素,包括重复项。

You can use a Hashset that will do the filtering for you: 您可以使用将为您执行过滤的Hashset

HashSet<string> strings = new HashSet<string>();

for loop....
  foreach (string s in newStringArrayToAdd)
    strings.Add(s);
end of loop:

uniqueStrings = strings.ToArray();

use a HashSet<string> . 使用HashSet<string> Here is more information on how lists work: http://msdn.microsoft.com/en-us/library/bb359438.aspx 这是有关列表工作方式的更多信息: http : //msdn.microsoft.com/zh-cn/library/bb359438.aspx

Have you considered storing the strings in a HashSet rather than an array? 您是否考虑过将字符串存储在HashSet中而不是存储在数组中? The hashset will guarantee uniqueness in constant time every time you add to it. 每次添加到哈希集后,它都将在恒定时间内保证唯一性。

var strings = new HashSet<string>();
strings.Add("abc");
strings.Add("abc");
strings.Count // is 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM