简体   繁体   English

为什么我不能在没有枚举的情况下从HashSet中检索项目?

[英]Why can't I retrieve an item from a HashSet without enumeration?

I'm looking for insight into the heads of HashSet designers. 我正在寻找洞察HashSet设计师的头脑。 As far as I am aware, my question applies to both Java and C# HashSets, making me think there must be some good reason for it, though I can't think of any myself. 据我所知,我的问题适用于Java和C#HashSets,让我觉得必须有一些很好的理由,尽管我自己也想不到。

After I have inserted an item into a HashSet, why is it impossible to retrieve that item without enumeration, hardly an efficient operation? 在我将项目插入HashSet之后,为什么在没有枚举的情况下检索该项目是不可能的,几乎不是有效的操作? Especially since a HashSet is explicitly built in a way which supports efficient retrieval. 特别是因为HashSet以支持有效检索的方式显式构建。

It would often be useful to me to have Remove(x) and Contains(x) return the actual item that is being removed or contained. 使用Remove(x)和Contains(x)返回正在删除或包含的实际项目通常很有用。 This is not necessarily the item I pass into the Remove(x) or Contains(x) function. 这不一定是我传递给Remove(x)或Contains(x)函数的项目。 Sure, I guess I could achieve the same effect through a HashMap but why waste all that space and effort when it should be perfectly possible to do this with a set? 当然,我想我可以通过HashMap实现同样的效果但是为什么浪费所有这些空间和努力时应该完全可以用一套呢?

I can appreciate that there may be some design concerns that adding this functionality would allows uses of HashSet which are not consistent with their role or future role in the framework, but if this is so, what are these design concerns? 我可以理解,可能存在一些设计问题,即添加此功能将允许使用HashSet,这与其角色或框架中的未来角色不一致,但如果是这样,那么这些设计问题是什么?

Edit 编辑

To answer some more questions, here are more details: 要回答更多问题,请参阅以下详细信息:

I am using an immutable reference type with overridden hashcode, equals, etc to emulate a value type in C#. 我使用带有重写的hashcode,equals等的不可变引用类型来模拟C#中的值类型。 Let's say the type has members A, B, and C. Hashcode, equals, etc depend only on A and B. Given some A and BI want to be able to retrieve that equivalent item from a hashset and get it's C. I won't be able to use HashSet for this it appears, but I would at least like to know if there is any good reason for this. 假设类型具有成员A,B和C.Hashcode,equals等仅依赖于A和B.给定A和BI希望能够从散列集中检索该等效项并得到它C.我赢了它似乎可以使用HashSet,但我至少想知道这是否有任何充分的理由。 Pseudo code follows: 伪代码如下:

public sealed class X{
 object A;
 object B;
 object extra;

 public int HashCode(){
  return A.hashCode() + B.hashCode();
 }

 public bool Equals(X obj){
  return obj.A == A && obj.B == B;
 }
}

hashset.insert(new X(1,2, extra1));
hashset.contains(new X(1,2)); //returns true, but I can't retrieve extra

In .Net, what you are probably looking for is KeyedCollection http://msdn.microsoft.com/en-us/library/ms132438.aspx 在.Net中,您可能正在寻找的是KeyedCollection http://msdn.microsoft.com/en-us/library/ms132438.aspx

You can get around the nastiness of re-implementing this abstract class each time with some "generic" cleverness. 你可以通过一些“通用”的聪明来解决每次重新实现这个抽象类的麻烦。 (See IKeyedObject`1.) (见IKeyedObject`1。)

Note: Any data transfer object which implements IKeyedObject`1 should have an overridden GetHashCode method simply returning this.Key.GetHashCode(); 注意:任何实现IKeyedObject`1的数据传输对象都应该有一个重写的GetHashCode方法,只需返回this.Key.GetHashCode(); and same goes for equals... 同样适用于......

My Base Class Library usually ends up with something like this in it: 我的基类库通常最终会包含这样的内容:

public class KeyedCollection<TItem> : System.Collections.ObjectModel.KeyedCollection<TItem, TItem>
    where TItem : class
{
    public KeyedCollection() : base()
    {
    }

    public KeyedCollection(IEqualityComparer<TItem> comparer) : base(comparer)
    {
    }

    protected override TItem GetKeyForItem(TItem item)
    {
        return item;
    }
}

public class KeyedObjectCollection<TKey, TItem> : System.Collections.ObjectModel.KeyedCollection<TKey, TItem>
    where TItem : class, IKeyedObject<TKey>
    where TKey : struct
{
    public KeyedCollection() : base()
    {
    }

    protected override TItem GetKeyForItem(TItem item)
    {
        return item.Key;
    }
}

///<summary>
/// I almost always implement this explicitly so the only
/// classes that have access without some rigmarole
/// are generic collections built to be aware that an object
/// is keyed.
///</summary>
public interface IKeyedObject<TKey>
{
    TKey Key { get; }
}

How were you proposing to retrieve the item from the hash set? 您是如何建议从哈希集中检索项目的? A set is by definition not ordered in any way and therefore, there is no index with which to use to retrieve the object in question. 根据定义,集合没有以任何方式排序,因此,没有索引用于检索有问题的对象。

Sets, as a concept, are used to test inclusion, ie whether or not the element in question is in the hash data set. 作为概念,集合用于测试包含,即所讨论的元素是否在散列数据集中。 If you're looking to retrieve a value from a data source using a key value or index, I would suggest looking into either a Map or a List . 如果您希望使用键值或索引从数据源中检索值,我建议您查看MapList

EDIT: Additional answer based on the Edit to the original question 编辑:基于编辑原始问题的附加答案

Soonil, based on your new information, it looks like you might be interested in implementing your data as a Java Enum, something similar to this: Soonil,基于您的新信息,看起来您可能有兴趣将您的数据实现为Java Enum,类似于:

 public enum SoonilsDataType {
      A, B, C;

      // Just an example of what's possible
      public static SoonilsDataType getCompositeValue(SoonilsDataType item1,
           SoonilsDataType item2) {
           if (item1.equals(A) && 
                     item2.equals(B)) {
                return C;
           }
      }
 }

Enum's automatically inherit values() which returns the list of all values in the enum's "set", which you can use to test inclusion against in the same way as the Set. Enum自动继承values(),它返回枚举“set”中所有值的列表,您可以使用它来以与Set相同的方式测试包含。 Also, because its a full class, you can define new static methods to do the composite logic (like I was trying to allude to in the example code). 另外,因为它是一个完整的类,你可以定义新的静态方法来执行复合逻辑(就像我试图在示例代码中提到的那样)。 The only thing about the Enum is that you can't add new instances at runtime, which may not be what you want (though if the set's data size isn't going to grow at runtime, the Enum is what you want). 关于Enum的唯一事情就是你不能在运行时添加新的实例,这可能不是你想要的(尽管如果set的数据大小不会在运行时增长,那么Enum就是你想要的)。

If you change an object after it has been inserted, it's hash may have changed (this is especially likely if hashCode() has been overridden). 如果在插入对象后更改它,则它的散列可能已更改(如果已覆盖hashCode(),则特别有可能)。 If the hash changes, a lookup of it in the set will fail, as you will be attempting to lookup an object that is hashed at a different location than it is stored in. 如果哈希值发生更改,则在集合中查找它将失败,因为您将尝试查找在与存储位置不同的位置进行哈希处理的对象。

Also, you need to make sure you have overridden hashCode and equals in your object if you want to lookup equal objects that are different instances. 此外,如果要查找不同实例的相等对象,则需要确保在对象中覆盖了hashCode和equals。

Note that this is all for Java - I am assuming C# has something similar, but as it has been several years since I used C#, I will let others speak to it's capabilities. 请注意,这完全适用于Java - 我假设C#有类似的东西,但是自从我使用C#以来已有好几年了,我会让别人说出它的功能。

Why not just use a HashMap<X,X> ? 为什么不使用HashMap<X,X> This does exactly what you want. 这完全符合你的要求。 Just do .put(x,x) every time and then you can just get the stored element equal to x with .get(x) . 只需每次执行.put(x,x) ,然后你就可以使用.get(x)得到存储的元素等于.get(x)

I imagine the designers of the Set interface and HashSet class wanted to ensure that the remove(Object) method defined on the Collection interface was also applicable to Set ; 我想Set接口和HashSet类的设计者想要确保Collection接口上定义的remove(Object)方法也适用于Set ; this method returns a boolean denoting whether the object was successfully removed. 此方法返回一个布尔值,表示对象是否已成功删除。 If the designers wanted to provide functionality whereby remove(Object) returned the "equal" object already in the Set this would mean a different method signature. 如果设计者想要提供删除(Object)返回Set已经存在的“相等”对象的功能,则这将意味着不同的方法签名。

Also, given that the object being removed is logically equal to the object passed to remove(Object) it is arguable about the value added in returning the contained object. 另外,假设被删除的对象在逻辑上等于传递给remove(Object)的对象,那么返回包含的对象时添加的值是有争议的。 However, I have had this problem myself before and have used a Map to solve the problem. 但是,我之前遇到过这个问题,并使用Map来解决问题。

Note that in Java, a HashSet uses a HashMap internally and so there isn't additional storage overhead in using a HashMap instead. 请注意,在Java中, HashSet使用HashMap内部,因此没有在使用额外的存储开销HashMap代替。

This was an oversight from the library designers. 这是图书馆设计师的疏忽。 As I mentioned under another answer , this method has been added to .NET Framework 4.7.2 (and .NET Core 2.0 before it); 正如我在另一个答案中提到的,此方法已添加到.NET Framework 4.7.2 (以及之前的.NET Core 2.0 )中; see HashSet<T>.TryGetValue . 请参阅HashSet<T>.TryGetValue Citing the source : 引用来源

/// <summary>
/// Searches the set for a given value and returns the equal value it finds, if any.
/// </summary>
/// <param name="equalValue">The value to search for.
/// </param>
/// <param name="actualValue">
/// The value from the set that the search found, or the default value
/// of <typeparamref name="T"/> when the search yielded no match.</param>
/// <returns>A value indicating whether the search was successful.</returns>
/// <remarks>
/// This can be useful when you want to reuse a previously stored reference instead of 
/// a newly constructed one (so that more sharing of references can occur) or to look up
/// a value that has more complete data than the value you currently have, although their
/// comparer functions indicate they are equal.
/// </remarks>
public bool TryGetValue(T equalValue, out T actualValue)

Looks to me like you're actually looking for a Map<X,Y> , where Y is the type of extra1 . 在我看来,你实际上正在寻找一个Map<X,Y> ,其中Y是extra1的类型。


(rant below) (下面咆哮)

The equals and hashCode methods define meaningful object equality. equals和hashCode方法定义有意义的对象相等性。 The HashSet class assumes that if two objects are equal as defined by Object.equals(Object) there is no difference between these two objects. HashSet类假定如果Object.equals(Object)定义的两个对象相等,则这两个对象之间没有区别。

I'd go as far as to say that if the object extra is meaningful, your design is not ideal. 我甚至可以说,如果object extraobject extra是有意义的,那么你的设计并不理想。

SOLVED . 解决了 Wishing to find an element seems perfectly valid to me, because the representative used for the search may differ from the found element. 希望找到一个元素对我来说似乎完全有效,因为用于搜索的代表可能与找到的元素不同。 This is especially true if elements contain key and value information, and a custom equality comparer compares the key part only. 如果元素包含键和值信息,并且自定义相等比较器仅比较关键部分,则尤其如此。 See the code example. 请参阅代码示例。 The code contains a comparer that implements a custom search and that captures the element found. 该代码包含一个比较器,它实现自定义搜索捕获找到的元素。 This requires an instance of the comparer. 这需要比较器的一个实例。 Clear the reference to the found element. 清除对找到的元素的引用。 Perform a search by means of Contains. 通过Contains执行搜索。 Access the found element. 访问找到的元素。 Be aware of multithread issues when sharing the comparer instance. 共享比较器实例时请注意多线程问题。

using System;
using System.Collections.Generic;

namespace ConsoleApplication1 {

class Box
{
    public int Id;
    public string Name;
    public Box(int id, string name)
    {
        Id = id;
        Name = name;
    }
}

class BoxEq: IEqualityComparer<Box>
{
    public Box Element;

    public bool Equals(Box element, Box representative)
    {
        bool found = element.Id == representative.Id;
        if (found)
        {
            Element = element;
        }
        return found;
    }

    public int GetHashCode(Box box)
    {
        return box.Id.GetHashCode();
    }
}

class Program
{
    static void Main()
    {
        var boxEq = new BoxEq();
        var hashSet = new HashSet<Box>(boxEq);
        hashSet.Add(new Box(3, "Element 3"));
        var box5 = new Box(5, "Element 5");
        hashSet.Add(box5);
        var representative = new Box(5, "Representative 5");
        boxEq.Element = null;
        Console.WriteLine("Contains {0}: {1}", representative.Id, hashSet.Contains(representative));
        Console.WriteLine("Found id: {0}, name: {1}", boxEq.Element.Id, boxEq.Element.Name);
        Console.WriteLine("Press enter");
        Console.ReadLine();
    }
}

} // namespace

Set objects in those languages were mostly designed as set of value, not for mutable objects. 这些语言中的集合对象大多设计为值集,而不是可变对象。 They check that object put in them are unique by using equals. 他们通过使用equals来检查放入它们的对象是否是唯一的。 That is why contains and remove returns boolean, not the object: they check for or remove the value you pass to them. 这就是为什么contains和remove返回boolean而不是对象:它们检查或删除传递给它们的值。

And actually, if you do a contains(X) on a set, and expect to get a different object Y, that would means X and Y are equals (ie X.equals(Y) => true), but somewhat different, which seems wrong. 实际上,如果你在一个集合上做一个包含(X),并期望得到一个不同的对象Y,那就意味着X和Y是等于(即X.equals(Y)=> true),但有些不同,似乎错了。

I was given an interesting suggestion as to a way to use a Map, by having my own objects define themselves as KeyValuePairs. 通过让我自己的对象将自己定义为KeyValuePairs,我得到了一个关于使用Map的方法的有趣建议。 While a good concept, unfortunately KeyValuePair is not an interface (why not?) and is a struct, which shoots that plan out of the air. 虽然是一个很好的概念,但遗憾的是KeyValuePair不是一个界面(为什么不呢?)并且是一个结构,它可以在空中拍摄这个计划。 In the end I will roll my own Set, as my constraints allow me this option. 最后我将滚动我自己的Set,因为我的约束允许我这个选项。

After wondering the same thing, and finely being able to look at the source code: 想知道同样的事情,并且能够很好地查看源代码:

source: http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs 来源: http//referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs

A set is a collection of unique items (objects or values). 集合是唯一项(对象或值)的集合。 In the .net implementation an item is the same as another item (not unique) if the Equals method of the comparer returns true for the two items. 在.net实现中,如果比较器的Equals方法对这两个项返回true,则项与另一个项(非唯一)相同。 Not if the two items have the same hash code. 如果这两个项具有相同的哈希码,则不会。 so a check of the existence of an item is a two step process. 所以检查项目是否存在是一个两步过程。 first using the hashset to minimize the number of items to compere, then the compression itself. 首先使用hashset来最小化要主持的项目数,然后是压缩本身。

If you wish to retrieve an item, you must be able to supply the retrieving function with a unique identifier. 如果要检索项目,则必须能够为检索功能提供唯一标识符。 you might know the hash code of the item you want. 您可能知道所需项目的哈希码。 but that is not enough. 但这还不够。 as more than one item can have that same hash. 因为多个项目可以具有相同的哈希值。 you will also need to supply the item itself so that the Equal method can be called. 您还需要提供项目本身,以便可以调用Equal方法。 and clearly if you have the item there is no reason to get it. 如果你有这个项目就没有理由得到它。

One could create a data structure that demands that no two unique items ever return the same hash code. 可以创建一个数据结构,要求没有两个唯一的项返回相同的哈希码。 and than you could get an item from it. 而且你可以从它得到一个项目。 it will be faster of adding*, and retrieving will be possible if you know the hash. 添加*会更快,如果你知道哈希就可以检索。 if two items that are not equal but return the same hash are put into it the first will be overwritten. 如果两个不相等但返回相同散列的项目被放入其中,则第一个将被覆盖。 as far as I know this Type doesn't exist in .net , and no this is not the same as a dictionary. 据我所知,这个类型在.net中不存在,并且这与字典不同。

*given that the GetHash method is the same. *鉴于GetHash方法是相同的。

Short answer; 简短的回答; because the items cannot be guaranteed to be immutable. 因为物品不能保证是不可变的。

I've hit the exact problem you describe, where the HashCode is based on fixed fields within the member class, but the class holds additional information that can be updated without changing the hash. 我已经遇到了您描述的确切问题,其中HashCode基于成员类中的固定字段,但该类包含可以在不更改哈希值的情况下更新的其他信息。

My solution was to implement a generic MyHashSet<T> based on ICollection<T> but wrapped round a Dictionary<int, List<T>> to provide the required lookup efficiency, where the int key is the HashCode of T. However, this shows that if the HashCode of the member objects can change then the dictionary lookup followed by equality comparison of items in the list will never find the changed items. 我的解决方案是基于ICollection <T>实现一个通用的MyHashSet <T>,但是绕过Dictionary <int,List <T >>以提供所需的查找效率,其中int键是T的HashCode。但是,这个表明如果成员对象的HashCode可以更改,那么字典查找后跟列表中项目的相等比较将永远不会找到更改的项目。 There is no mechanism for forcing the members to be immutable so the only solution is to enumerate the lot. 没有强制成员不可变的机制,因此唯一的解决方案是枚举该批次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM