简体   繁体   English

如何从 Java 中的 HashMap 中选择一个随机键?

[英]How to select a random key from a HashMap in Java?

I'm working with a large ArrayList<HashMap<A,B>> , and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it).我正在使用一个大的ArrayList<HashMap<A,B>> ,我会反复需要从随机 HashMap 中选择一个随机键(并用它做一些事情)。 Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?选择随机 HashMap 是微不足道的,但我应该如何从这个 HashMap 中选择一个随机键?

Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option.速度很重要(因为我需要这样做 10000 次并且哈希图很大),所以只在 [0,9999] 中选择一个随机数 k 然后在迭代器上执行.next() k 次,真的不是一个选择. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option.同样,在每次随机选择时将 HashMap 转换为数组或 ArrayList 也不是一种选择。 Please, read this before replying.请在回复之前阅读此内容。

Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[] .从技术上讲,我觉得这应该是可能的,因为 HashMap 在内部将其键存储在Entry[] ,并且从数组中随机选择很容易,但我不知道如何访问这个Entry[] So any ideas to access the internal Entry[] are more than welcome.因此,任何访问内部Entry[]想法都非常受欢迎。 Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.当然也欢迎其他解决方案(只要它们不消耗哈希图大小的线性时间)。

Note: heuristics are fine, so if there's a method that excludes 1% of the elements (eg because of multi-filled buckets) that's no problem at all.注意:启发式很好,所以如果有一种方法可以排除 1% 的元素(例如,因为多填充桶),那完全没有问题。

from top of my head从我的头顶

List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()

then just那么就

map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))

I managed to find a solution without performance loss.我设法找到了一个没有性能损失的解决方案。 I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).我会把它贴在这里,因为它可能会帮助其他人——并且可能会回答关于这个主题的几个悬而未决的问题(我稍后会搜索这些)。

What you need is a second custom Set -like data structure to store the keys -- not a list as some suggested here.您需要的是第二个类似Set的自定义数据结构来存储密钥——而不是这里建议的列表。 Lists-like data structures are to expensive to remove items from.类似列表的数据结构删除项目的成本很高。 The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element.所需的操作是在恒定时间内添加/删除元素(以使其与 HashMap 保持同步)和选择随机元素的过程。 The following class MySet does exactly this下面的类MySet正是这样做的

class MySet<A> {
     ArrayList<A> contents = new ArrayList();
     HashMap<A,Integer> indices = new HashMap<A,Integer>();
     Random R = new Random();

     //selects random element in constant time
     A randomKey() {
         return contents.get(R.nextInt(contents.size()));
     }

     //adds new element in constant time
     void add(A a) {
         indices.put(a,contents.size());
         contents.add(a);
     }

     //removes element in constant time
     void remove(A a) {
        int index = indices.get(a);
        contents.set(index,contents.get(contents.size()-1));
        indices.put(contents.get(index),index);
        contents.remove((int)(contents.size()-1));
        indices.remove(a);
     }
}

You need access to the underlying entry table.您需要访问基础条目表。

// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();

public Entry randomEntry(HashMap map) {
    Entry[] entries = (Entry[]) table.get(map);
    int start = rand.nextInt(entries.length);
    for(int i=0;i<entries.length;i++) {
       int idx = (start + i) % entries.length;
       Entry entry = entries[idx];
       if (entry != null) return entry;
    }
    return null;
}

This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).这仍然必须遍历条目以找到存在的条目,因此最坏的情况是 O(n),但典型的行为是 O(1)。

听起来您应该考虑将辅助键列表或真实对象而不是 Map 存储在您的列表中。

As @Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation.正如@Alberto Di Gioacchino 指出的那样,已接受的解决方案中存在一个带有删除操作的错误。 This is how I fixed it.这就是我修复它的方式。

class MySet<A> {
     ArrayList<A> contents = new ArrayList();
     HashMap<A,Integer> indices = new HashMap<A,Integer>();
     Random R = new Random();

     //selects random element in constant time
     A randomKey() {
         return contents.get(R.nextInt(contents.size()));
     }

     //adds new element in constant time
     void add(A item) {
         indices.put(item,contents.size());
         contents.add(item);
     }

     //removes element in constant time
     void remove(A item) {
        int index = indices.get(item);
        contents.set(index,contents.get(contents.size()-1));
        indices.put(contents.get(index),index);
        contents.remove(contents.size()-1);
        indices.remove(item);
     }
}

I'm assuming you are using HashMap as you need to look something up at a later date?我假设您正在使用HashMap因为您需要在以后查找某些内容?

If not the case, then just change your HashMap to an Array / ArrayList .如果不是这种情况,那么只需将您的HashMap更改为Array / ArrayList

If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.如果是这种情况,为什么不将您的对象存储在MapArrayList以便您可以随机或按键查找。

Alternatively, could you use a TreeMap instead of HashMap ?或者,您可以使用TreeMap而不是HashMap吗? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.我不知道您的密钥是什么类型,但您将TreeMap.floorKey()与一些密钥随机化器结合使用。

After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys.花了一些时间后,我得出结论,您需要创建一个模型,该模型可以由List<Map<A, B>>List<A>来维护您的密钥。 You need to keep the access of your List<Map<A, B>> and List<A> , just provide the operations/methods to the caller.您需要保留对List<Map<A, B>>List<A>的访问权限,只需向调用者提供操作/方法即可。 In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.通过这种方式,您将完全控制实现,并且实际对象将更安全,不受外部更改的影响。

Btw, your questions lead me to,顺便说一句,你的问题让我想到,

This example, IndexedSet , may give you an idea about how-to.这个例子, IndexedSet ,可以让你了解如何做。

[edited] [编辑]

This class, SetUniqueList , might help you if you decide to create your own model.如果您决定创建自己的模型,这个类SetUniqueList可能会对您有所帮助。 It explicitly states that it wraps the list , not copies.它明确指出它包装了list ,而不是副本。 So, I think, we can do something like,所以,我认为,我们可以做一些类似的事情,

List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);

Note: I didn't try this myself.注意:我自己没有尝试过。 Will do that later (rushing to home).稍后会这样做(赶回家)。

Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator() , make log(map.size()) trySplit() calls and choose either the first or the second half randomly.从 Java 8 开始,有一个 O(log(N)) 方法和 O(log(N)) 额外的内存:通过map.entrySet().spliterator()创建一个Spliterator , make log(map.size()) trySplit()调用并随机选择前半部分或后半部分。 When there are say less than 10 elements left in a Spliterator , dump them into a list and make a random pick.Spliterator元素少于 10 个Spliterator ,将它们转储到列表中并随机选择。

If you absolutely need to access the Entry array in HashMap, you can use reflection.如果绝对需要访问 HashMap 中的 Entry 数组,则可以使用反射。 But then your program will be dependent on that concrete implementation of HashMap.但是你的程序将依赖于 HashMap 的具体实现。

As proposed, you can keep a separate list of keys for each map.按照建议,您可以为每个地图保留一个单独的键列表。 You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.您不会保留密钥的深层副本,因此实际的内存非规范化不会那么大。

Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.第三种方法是实现您自己的 Map 实现,该实现将键保存在列表中而不是集合中。

How about wrapping HashMap in another implementation of Map?如何将 HashMap 包装在 Map 的另一个实现中? The other map maintains a List, and on put() it does:另一个映射维护一个列表,在 put() 上它会:

if (inner.put(key, value) == null) listOfKeys.add(key);

(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower) (我假设值的空值是不允许的,如果它们使用 containsKey,但速度较慢)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM