简体   繁体   English

迭代时从哈希集中删除

[英]Remove from hashset while iterating

I have the following code: 我有以下代码:

       List<HashSet<String>> authorLists = new List<HashSet<String>>
       // fill it
        /** Remove duplicate authors  */
        private void removeDublicateAuthors(HashSet<String> newAuthors, int curLevel)
        {

            for (int i = curLevel - 1; i > 0; --i)
            {
                HashSet<String> authors = authorLists[i];
                foreach (String item in newAuthors)
                {
                    if (authors.Contains(item))
                    {
                        newCoauthors.Remove(item);
                    }
                }
            }
        }

How to remove items correctly? 如何正确删除物品? I need to iterate through newAuthors and authorLists. 我需要遍历newAuthors和authorLists。 RemoveWhere cannot be used here by this reason. 因此,无法在此处使用RemoveWhere。

It is very inefficient to create new list, add items to them and then remove duplicate items. 创建新列表,向其中添加项目然后删除重复的项目效率非常低。 In my case, list of authorLists has following sizes: 就我而言,authorLists列表具有以下大小:

authorLists [0].size = 0;
authorLists [1].size = 322;
authorLists [2].size = 75000; // (even more than this value)

I need to call removeDublicateAuthors 1*(1) 322 (n) 75000 (m) times where n and m are the sizes of duplicate authors on the 1st and 2nd levels correspondingly. 我需要调用removeDublicateAuthors 1 *(1) 322 (n) 75000 (m)次,其中n和m分别是第一层和第二层上重复作者的大小。 I have to delete these items very often and the size of array is very large. 我必须经常删除这些项目,并且数组的大小很大。 So, this algorithm is very inefficient. 因此,该算法效率很低。 Actually I have the following code in Java and to rewrite it by some reasons: 实际上,我具有以下Java代码,并出于某些原因对其进行了重写:

/** Remove duplicate authors in tree of Authors*/ / **在作者树中删除重复的作者* /

private void removeDublicateAuthors(HashSet<String> newCoauthors, int curLevel ) {

for(int i = curLevel - 1; i > 0; --i) {
    HashSet<String> authors = coauthorLevels.get(i);
    for (Iterator<String> iter = newCoauthors.iterator(); iter.hasNext();) {
        iter.next();
        if(authors.contains(iter)) {
            iter.remove();
        }
    }
}
}

It works much faster than suggested options at the moment 目前它的工作速度比建议的选项快得多

您可以将要删除的项目添加到另一个哈希集中,然后将其全部删除。

What you are doing here is wrong because of 2 reasons: 1. you cannot alter a set you are parsing through - sintax problem 2. even if you make your code work, you will only alter the value, not the reference - logic problem 您在这里所做的操作有误,原因有两个:1.您无法更改正在解析的集合-sintax问题2.即使您的代码正常工作,您也只会更改值,而不是引用-逻辑问题

   List<HashSet<String>> authorLists = new List<HashSet<String>>
   // fill it
   /** Remove duplicate authors  */
   // handle reference instead of value
   private void removeDublicateAuthors(ref HashSet<String> newAuthors, int curLevel)
   {
       List<string> removeAuthors = new List<string>();

       for (int i = curLevel - 1; i > 0; --i)
       {
           HashSet<String> authors = authorLists[i];
           foreach (String item in newAuthors)
           {
               if (authors.Contains(item))
               {
                   removeAuthors .Add(item);
               }
           }
       }

       foreach(string author in removeAuthors)
       {
           newAuthors.Remove(author);
       }
   }

What you're looking for is ExceptWith . 您正在寻找的是ExceptWith You're trying to find the set of one set subtracted from another, which is exactly what that method does. 您正在尝试找到一个从另一个集合中减去的集合,这正是该方法的作用。

Forgive me if I don't understand what you are trying to do. 如果我不明白您要做什么,请原谅我。

Hash sets don't allow duplicates because the index of an item is the hash of the item. 哈希集不允许重复,因为项目的索引是该项目的哈希。 Two equal strings would have the same hash, and therefore the same index. 两个相等的字符串将具有相同的哈希,因此具有相同的索引。 Therefore if you simply combine any two hash sets, the result is free from duplicates. 因此,如果您仅组合任意两个哈希集,结果就不会重复。

Consider the following: 考虑以下:

        var set1 = new HashSet<string>();
        set1.Add("foo");
        set1.Add("foo");

        var set2 = new HashSet<string>();
        set2.Add("foo");

        var set3 = set1.Union(set2);

        foreach (var val in set3)
        {
          Console.WriteLine(val);   
        }

The output of this code would be: 该代码的输出为:

foo

Now if you are trying to ensure that hashset A doesn't include any items in hashset B, you could do something like this: 现在,如果您要确保哈希集A在哈希集B中不包含任何项目,则可以执行以下操作:

        var set1 = new HashSet<string>();
        set1.Add("foo");
        set1.Add("bar");

        var set2 = new HashSet<string>();
        set2.Add("foo");
        set2.Add("baz");

        foreach (var val in set2)
        {
            set1.Remove(val);
        }

        foreach (var val in set1)
        {
            Console.WriteLine(val);    
        }

The output of which would be: 其输出将是:

bar

Giving this some more thought, you can subtract one set from another using the .Except method. 仔细考虑一下,您可以使用.Except方法从一组中减去一组。

var set3 = set1.Except(set2);

This produces all the items in set1 that are not in set2 这将产生set1中所有不在set2中的项目

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM