简体   繁体   English

从集合中随机选取一个元素

[英]Picking a random element from a set

How do I pick a random element from a set?如何从集合中随机选择一个元素? I'm particularly interested in picking a random element from a HashSet or a LinkedHashSet, in Java.我对从 Java 中的 HashSet 或 LinkedHashSet 中选择一个随机元素特别感兴趣。 Solutions for other languages are also welcome.也欢迎其他语言的解决方案。

int size = myHashSet.size();
int item = new Random().nextInt(size); // In real life, the Random object should be rather more shared than this
int i = 0;
for(Object obj : myhashSet)
{
    if (i == item)
        return obj;
    i++;
}

A somewhat related Did You Know:一个有点相关的你知道吗:

There are useful methods in java.util.Collections for shuffling whole collections: Collections.shuffle(List<?>) and Collections.shuffle(List<?> list, Random rnd) . java.util.Collections有一些有用的方法可以对整个集合进行混洗: Collections.shuffle(List<?>)Collections.shuffle(List<?> list, Random rnd)

Fast solution for Java using an ArrayList and a HashMap : [element -> index].使用ArrayListHashMap Java 快速解决方案:[元素 -> 索引]。

Motivation: I needed a set of items with RandomAccess properties, especially to pick a random item from the set (see pollRandom method).动机:我需要一组具有RandomAccess属性的项目,尤其是从集合中挑选一个随机项目(参见pollRandom方法)。 Random navigation in a binary tree is not accurate: trees are not perfectly balanced, which would not lead to a uniform distribution.二叉树中的随机导航不准确:树不是完全平衡的,这不会导致均匀分布。

public class RandomSet<E> extends AbstractSet<E> {

    List<E> dta = new ArrayList<E>();
    Map<E, Integer> idx = new HashMap<E, Integer>();

    public RandomSet() {
    }

    public RandomSet(Collection<E> items) {
        for (E item : items) {
            idx.put(item, dta.size());
            dta.add(item);
        }
    }

    @Override
    public boolean add(E item) {
        if (idx.containsKey(item)) {
            return false;
        }
        idx.put(item, dta.size());
        dta.add(item);
        return true;
    }

    /**
     * Override element at position <code>id</code> with last element.
     * @param id
     */
    public E removeAt(int id) {
        if (id >= dta.size()) {
            return null;
        }
        E res = dta.get(id);
        idx.remove(res);
        E last = dta.remove(dta.size() - 1);
        // skip filling the hole if last is removed
        if (id < dta.size()) {
            idx.put(last, id);
            dta.set(id, last);
        }
        return res;
    }

    @Override
    public boolean remove(Object item) {
        @SuppressWarnings(value = "element-type-mismatch")
        Integer id = idx.get(item);
        if (id == null) {
            return false;
        }
        removeAt(id);
        return true;
    }

    public E get(int i) {
        return dta.get(i);
    }

    public E pollRandom(Random rnd) {
        if (dta.isEmpty()) {
            return null;
        }
        int id = rnd.nextInt(dta.size());
        return removeAt(id);
    }

    @Override
    public int size() {
        return dta.size();
    }

    @Override
    public Iterator<E> iterator() {
        return dta.iterator();
    }
}

This is faster than the for-each loop in the accepted answer:这比接受的答案中的 for-each 循环更快:

int index = rand.nextInt(set.size());
Iterator<Object> iter = set.iterator();
for (int i = 0; i < index; i++) {
    iter.next();
}
return iter.next();

The for-each construct calls Iterator.hasNext() on every loop, but since index < set.size() , that check is unnecessary overhead. for-each 构造在每个循环中调用Iterator.hasNext() ,但由于index < set.size() ,该检查是不必要的开销。 I saw a 10-20% boost in speed, but YMMV.我看到速度提高了 10-20%,但是 YMMV。 (Also, this compiles without having to add an extra return statement.) (此外,无需添加额外的 return 语句即可编译。)

Note that this code (and most other answers) can be applied to any Collection, not just Set.请注意,此代码(以及大多数其他答案)可以应用于任何集合,而不仅仅是 Set。 In generic method form:以通用方法形式:

public static <E> E choice(Collection<? extends E> coll, Random rand) {
    if (coll.size() == 0) {
        return null; // or throw IAE, if you prefer
    }

    int index = rand.nextInt(coll.size());
    if (coll instanceof List) { // optimization
        return ((List<? extends E>) coll).get(index);
    } else {
        Iterator<? extends E> iter = coll.iterator();
        for (int i = 0; i < index; i++) {
            iter.next();
        }
        return iter.next();
    }
}

In Java 8:在 Java 8 中:

static <E> E getRandomSetElement(Set<E> set) {
    return set.stream().skip(new Random().nextInt(set.size())).findFirst().orElse(null);
}

If you want to do it in Java, you should consider copying the elements into some kind of random-access collection (such as an ArrayList).如果您想在 Java 中执行此操作,您应该考虑将元素复制到某种随机访问集合(例如 ArrayList)中。 Because, unless your set is small, accessing the selected element will be expensive (O(n) instead of O(1)).因为,除非您的集合很小,否则访问所选元素的成本将很高(O(n) 而不是 O(1))。 [ed: list copy is also O(n)] [ed:列表副本也是 O(n)]

Alternatively, you could look for another Set implementation that more closely matches your requirements.或者,您可以寻找另一个更符合您要求的 Set 实现。 The ListOrderedSet from Commons Collections looks promising. Commons Collections 中的ListOrderedSet看起来很有前途。

In Java:在 Java 中:

Set<Integer> set = new LinkedHashSet<Integer>(3);
set.add(1);
set.add(2);
set.add(3);

Random rand = new Random(System.currentTimeMillis());
int[] setArray = (int[]) set.toArray();
for (int i = 0; i < 10; ++i) {
    System.out.println(setArray[rand.nextInt(set.size())]);
}
List asList = new ArrayList(mySet);
Collections.shuffle(asList);
return asList.get(0);

This is identical to accepted answer (Khoth), but with the unnecessary size and i variables removed.这与接受的答案(Khoth)相同,但删除了不必要的sizei变量。

    int random = new Random().nextInt(myhashSet.size());
    for(Object obj : myhashSet) {
        if (random-- == 0) {
            return obj;
        }
    }

Though doing away with the two aforementioned variables, the above solution still remains random because we are relying upon random (starting at a randomly selected index) to decrement itself toward 0 over each iteration.尽管消除了上述两个变量,但上述解决方案仍然是随机的,因为我们依赖于 random(从随机选择的索引开始)在每次迭代中将自身递减到0

Clojure 解决方案:

(defn pick-random [set] (let [sq (seq set)] (nth sq (rand-int (count sq)))))

Perl 5 Perl 5

@hash_keys = (keys %hash);
$rand = int(rand(@hash_keys));
print $hash{$hash_keys[$rand]};

Here is one way to do it.这是一种方法。

Solution above speak in terms of latency but doesn't guarantee equal probability of each index being selected.上面的解决方案在延迟方面说,但不能保证每个索引被选择的概率相等。
If that needs to be considered, try reservoir sampling.如果需要考虑这一点,请尝试水库取样。 http://en.wikipedia.org/wiki/Reservoir_sampling . http://en.wikipedia.org/wiki/Reservoir_sampling
Collections.shuffle() (as suggested by few) uses one such algorithm. Collections.shuffle()(少数人建议)使用一种这样的算法。

C++. C++。 This should be reasonably quick, as it doesn't require iterating over the whole set, or sorting it.这应该相当快,因为​​它不需要遍历整个集合或对其进行排序。 This should work out of the box with most modern compilers, assuming they support tr1 .这应该适用于大多数现代编译器,假设它们支持tr1 If not, you may need to use Boost.如果没有,您可能需要使用 Boost。

The Boost docs are helpful here to explain this, even if you don't use Boost.即使您不使用 Boost, Boost 文档也有助于解释这一点。

The trick is to make use of the fact that the data has been divided into buckets, and to quickly identify a randomly chosen bucket (with the appropriate probability).诀窍是利用数据已被划分为桶的事实,并快速识别随机选择的桶(以适当的概率)。

//#include <boost/unordered_set.hpp>  
//using namespace boost;
#include <tr1/unordered_set>
using namespace std::tr1;
#include <iostream>
#include <stdlib.h>
#include <assert.h>
using namespace std;

int main() {
  unordered_set<int> u;
  u.max_load_factor(40);
  for (int i=0; i<40; i++) {
    u.insert(i);
    cout << ' ' << i;
  }
  cout << endl;
  cout << "Number of buckets: " << u.bucket_count() << endl;

  for(size_t b=0; b<u.bucket_count(); b++)
    cout << "Bucket " << b << " has " << u.bucket_size(b) << " elements. " << endl;

  for(size_t i=0; i<20; i++) {
    size_t x = rand() % u.size();
    cout << "we'll quickly get the " << x << "th item in the unordered set. ";
    size_t b;
    for(b=0; b<u.bucket_count(); b++) {
      if(x < u.bucket_size(b)) {
        break;
      } else
        x -= u.bucket_size(b);
    }
    cout << "it'll be in the " << b << "th bucket at offset " << x << ". ";
    unordered_set<int>::const_local_iterator l = u.begin(b);
    while(x>0) {
      l++;
      assert(l!=u.end(b));
      x--;
    }
    cout << "random item is " << *l << ". ";
    cout << endl;
  }
}

Since you said "Solutions for other languages are also welcome", here's the version for Python:既然你说“也欢迎其他语言的解决方案”,这里是 Python 的版本:

>>> import random
>>> random.choice([1,2,3,4,5,6])
3
>>> random.choice([1,2,3,4,5,6])
4

Can't you just get the size/length of the set/array, generate a random number between 0 and the size/length, then call the element whose index matches that number?你不能只获取集合/数组的大小/长度,生成一个介于 0 和大小/长度之间的随机数,然后调用索引与该数字匹配的元素吗? HashSet has a .size() method, I'm pretty sure. HashSet 有一个 .size() 方法,我很确定。

In psuedocode -在伪代码中 -

function randFromSet(target){
 var targetLength:uint = target.length()
 var randomIndex:uint = random(0,targetLength);
 return target[randomIndex];
}

PHP, assuming "set" is an array: PHP,假设“set”是一个数组:

$foo = array("alpha", "bravo", "charlie");
$index = array_rand($foo);
$val = $foo[$index];

The Mersenne Twister functions are better but there's no MT equivalent of array_rand in PHP. Mersenne Twister 函数更好,但 PHP 中没有与 array_rand 等效的 MT。

Javascript solution ;) Javascript 解决方案 ;)

function choose (set) {
    return set[Math.floor(Math.random() * set.length)];
}

var set  = [1, 2, 3, 4], rand = choose (set);

Or alternatively:或者:

Array.prototype.choose = function () {
    return this[Math.floor(Math.random() * this.length)];
};

[1, 2, 3, 4].choose();

Icon has a set type and a random-element operator, unary "?", so the expression Icon有一个集合类型和一个随机元素运算符,一元“?”,所以表达式

? set( [1, 2, 3, 4, 5] )

will produce a random number between 1 and 5.将产生一个 1 到 5 之间的随机数。

The random seed is initialized to 0 when a program is run, so to produce different results on each run use randomize()当程序运行时,随机种子被初始化为 0,因此每次运行都会产生不同的结果,使用randomize()

In C#在 C# 中

        Random random = new Random((int)DateTime.Now.Ticks);

        OrderedDictionary od = new OrderedDictionary();

        od.Add("abc", 1);
        od.Add("def", 2);
        od.Add("ghi", 3);
        od.Add("jkl", 4);


        int randomIndex = random.Next(od.Count);

        Console.WriteLine(od[randomIndex]);

        // Can access via index or key value:
        Console.WriteLine(od[1]);
        Console.WriteLine(od["def"]);

In lisp口齿不清

(defun pick-random (set)
       (nth (random (length set)) set))

How about just刚刚怎么样

public static <A> A getRandomElement(Collection<A> c, Random r) {
  return new ArrayList<A>(c).get(r.nextInt(c.size()));
}

For fun I wrote a RandomHashSet based on rejection sampling.为了好玩,我写了一个基于拒绝采样的 RandomHashSet。 It's a bit hacky, since HashMap doesn't let us access it's table directly, but it should work just fine.这有点棘手,因为 HashMap 不允许我们直接访问它的表,但它应该可以正常工作。

It doesn't use any extra memory, and lookup time is O(1) amortized.它不使用任何额外的内存,查找时间是 O(1) 分摊的。 (Because java HashTable is dense). (因为java HashTable 是密集的)。

class RandomHashSet<V> extends AbstractSet<V> {
    private Map<Object,V> map = new HashMap<>();
    public boolean add(V v) {
        return map.put(new WrapKey<V>(v),v) == null;
    }
    @Override
    public Iterator<V> iterator() {
        return new Iterator<V>() {
            RandKey key = new RandKey();
            @Override public boolean hasNext() {
                return true;
            }
            @Override public V next() {
                while (true) {
                    key.next();
                    V v = map.get(key);
                    if (v != null)
                        return v;
                }
            }
            @Override public void remove() {
                throw new NotImplementedException();
            }
        };
    }
    @Override
    public int size() {
        return map.size();
    }
    static class WrapKey<V> {
        private V v;
        WrapKey(V v) {
            this.v = v;
        }
        @Override public int hashCode() {
            return v.hashCode();
        }
        @Override public boolean equals(Object o) {
            if (o instanceof RandKey)
                return true;
            return v.equals(o);
        }
    }
    static class RandKey {
        private Random rand = new Random();
        int key = rand.nextInt();
        public void next() {
            key = rand.nextInt();
        }
        @Override public int hashCode() {
            return key;
        }
        @Override public boolean equals(Object o) {
            return true;
        }
    }
}

The easiest with Java 8 is: Java 8 最简单的方法是:

outbound.stream().skip(n % outbound.size()).findFirst().get()

where n is a random integer.其中n是一个随机整数。 Of course it is of less performance than that with the for(elem: Col)当然,它的性能不如for(elem: Col)

With Guava we can do a little better than Khoth's answer:使用番石榴,我们可以做得比 Khoth 的回答好一点:

public static E random(Set<E> set) {
  int index = random.nextInt(set.size();
  if (set instanceof ImmutableSet) {
    // ImmutableSet.asList() is O(1), as is .get() on the returned list
    return set.asList().get(index);
  }
  return Iterables.get(set, index);
}

In Mathematica:在 Mathematica 中:

a = {1, 2, 3, 4, 5}

a[[ ⌈ Length[a] Random[] ⌉ ]]

Or, in recent versions, simply:或者,在最近的版本中,只需:

RandomChoice[a]

This received a down-vote, perhaps because it lacks explanation, so here one is:这收到了反对票,也许是因为它缺乏解释,所以这里是:

Random[] generates a pseudorandom float between 0 and 1. This is multiplied by the length of the list and then the ceiling function is used to round up to the next integer. Random[]生成一个介于 0 和 1 之间的伪随机浮点数。它乘以列表的长度,然后使用上限函数向上取整到下一个整数。 This index is then extracted from a .这个索引然后从提取的a

Since hash table functionality is frequently done with rules in Mathematica, and rules are stored in lists, one might use:由于哈希表功能经常使用 Mathematica 中的规则来完成,并且规则存储在列表中,因此可以使用:

a = {"Badger" -> 5, "Bird" -> 1, "Fox" -> 3, "Frog" -> 2, "Wolf" -> 4};

PHP, using MT: PHP,使用 MT:

$items_array = array("alpha", "bravo", "charlie");
$last_pos = count($items_array) - 1;
$random_pos = mt_rand(0, $last_pos);
$random_item = $items_array[$random_pos];

Unfortunately, this cannot be done efficiently (better than O(n)) in any of the Standard Library set containers.不幸的是,这不能在任何标准库集容器中有效地完成(优于 O(n))。

This is odd, since it is very easy to add a randomized pick function to hash sets as well as binary sets.这很奇怪,因为很容易将随机选择函数添加到散列集和二进制集。 In a not to sparse hash set, you can try random entries, until you get a hit.在不稀疏的哈希集中,您可以尝试随机条目,直到命中为止。 For a binary tree, you can choose randomly between the left or right subtree, with a maximum of O(log2) steps.对于二叉树,您可以在左子树或右子树之间随机选择,最多 O(log2) 步。 I've implemented a demo of the later below:我已经实现了下面的演示:

import random

class Node:
    def __init__(self, object):
        self.object = object
        self.value = hash(object)
        self.size = 1
        self.a = self.b = None

class RandomSet:
    def __init__(self):
        self.top = None

    def add(self, object):
        """ Add any hashable object to the set.
            Notice: In this simple implementation you shouldn't add two
                    identical items. """
        new = Node(object)
        if not self.top: self.top = new
        else: self._recursiveAdd(self.top, new)
    def _recursiveAdd(self, top, new):
        top.size += 1
        if new.value < top.value:
            if not top.a: top.a = new
            else: self._recursiveAdd(top.a, new)
        else:
            if not top.b: top.b = new
            else: self._recursiveAdd(top.b, new)

    def pickRandom(self):
        """ Pick a random item in O(log2) time.
            Does a maximum of O(log2) calls to random as well. """
        return self._recursivePickRandom(self.top)
    def _recursivePickRandom(self, top):
        r = random.randrange(top.size)
        if r == 0: return top.object
        elif top.a and r <= top.a.size: return self._recursivePickRandom(top.a)
        return self._recursivePickRandom(top.b)

if __name__ == '__main__':
    s = RandomSet()
    for i in [5,3,7,1,4,6,9,2,8,0]:
        s.add(i)

    dists = [0]*10
    for i in xrange(10000):
        dists[s.pickRandom()] += 1
    print dists

I got [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] as output, so the distribution seams good.我得到 [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] 作为输出,所以分布接缝很好。

I've struggled with the same problem for myself, and I haven't yet decided weather the performance gain of this more efficient pick is worth the overhead of using a python based collection.我一直在为自己解决同样的问题,我还没有确定这个更高效的选择的性能提升是否值得使用基于 python 的集合的开销。 I could of course refine it and translate it to C, but that is too much work for me today :)我当然可以改进它并将其翻译成 C,但是今天这对我来说太多了 :)

you can also transfer the set to array use array it will probably work on small scale i see the for loop in the most voted answer is O(n) anyway您也可以将集合转移到数组使用数组它可能会在小范围内工作我看到投票最多的答案中的 for 循环无论如何都是 O(n)

Object[] arr = set.toArray();

int v = (int) arr[rnd.nextInt(arr.length)];

If you really just want to pick "any" object from the Set , without any guarantees on the randomness, the easiest is taking the first returned by the iterator.如果您真的只想从Set选择“任何”对象,而对随机性没有任何保证,最简单的方法是采用迭代器返回的第一个对象。

    Set<Integer> s = ...
    Iterator<Integer> it = s.iterator();
    if(it.hasNext()){
        Integer i = it.next();
        // i is a "random" object from set
    }

A generic solution using Khoth's answer as a starting point.使用 Khoth 的答案作为起点的通用解决方案。

/**
 * @param set a Set in which to look for a random element
 * @param <T> generic type of the Set elements
 * @return a random element in the Set or null if the set is empty
 */
public <T> T randomElement(Set<T> set) {
    int size = set.size();
    int item = random.nextInt(size);
    int i = 0;
    for (T obj : set) {
        if (i == item) {
            return obj;
        }
        i++;
    }
    return null;
}

If set size is not large then by using Arrays this can be done. 如果设置的大小不大,则可以使用数组来完成。

int random;
HashSet someSet;
<Type>[] randData;
random = new Random(System.currentTimeMillis).nextInt(someSet.size());
randData = someSet.toArray();
<Type> sResult = randData[random];

If you don't mind a 3rd party library, the Utils library has a IterableUtils that has a randomFrom(Iterable iterable) method that will take a Set and return a random element from it如果您不介意第 3 方库, Utils库有一个IterableUtils ,它有一个 randomFrom(Iterable iterable) 方法,该方法将获取一个 Set 并从中返回一个随机元素

Set<Object> set = new HashSet<>();
set.add(...);
...
Object random = IterableUtils.randomFrom(set);

It is in the Maven Central Repository at:它位于 Maven 中央存储库中:

<dependency>
  <groupId>com.github.rkumsher</groupId>
  <artifactId>utils</artifactId>
  <version>1.3</version>
</dependency>

Java 8+ Stream: Java 8+ 流:

    static <E> Optional<E> getRandomElement(Collection<E> collection) {
        return collection
                .stream()
                .skip(ThreadLocalRandom.current()
                .nextInt(collection.size()))
                .findAny();
    }

Based on the answer of Joshua Bone but with slight changes:基于Joshua Bone回答,但略有变化:

  • Ignores the Streams element order for a slight performance increase in parallel operations忽略 Streams 元素顺序以在并行操作中略微提高性能
  • Uses the current thread's ThreadLocalRandom使用当前线程的 ThreadLocalRandom
  • Accepts any Collection type as input接受任何集合类型作为输入
  • Returns the provided Optional instead of null返回提供的 Optional 而不是 null

after reading this thread, the best i could write is:读完这篇文章后,我能写的最好的是:

static Random random = new Random(System.currentTimeMillis());
public static <T> T randomChoice(T[] choices)
{
    int index = random.nextInt(choices.length);
    return choices[index];
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM