简体   繁体   中英

How to efficiently select a random element from a Scala immutable HashSet

I have a scala.collection.immutable.HashSet that I want to randomly select an element from.

I could solve the problem with an extension method like this:

implicit class HashSetExtensions[T](h: HashSet[T]) {
  def nextRandomElement (): Option[T] = {
    val list = h.toList
    list match {
      case null | Nil => None
      case _ => Some (list (Random.nextInt (list.length)))
    }
  }
}

...but converting to a list will be slow. What would be the most efficient solution?

WARNING This answer is for experimental use only. For real project you probably should use your own collection types.

So i did some research in the HashSet source and i think there is little opportunity to someway extract the inner structure of most valuable class HashTrieSet without package violation.

I did come up with this code, which is extended Ben Reich's solution :

package scala.collection

import scala.collection.immutable.HashSet
import scala.util.Random

package object random {
  implicit class HashSetRandom[T](set: HashSet[T]) {
    def randomElem: Option[T] = set match {
      case trie: HashSet.HashTrieSet[T] => {
        trie.elems(Random.nextInt(trie.elems.length)).randomElem
      }
      case _ => Some(set.size) collect {
        case size if size > 0 => set.iterator.drop(Random.nextInt(size)).next
      }
    }
  }
}

file should be created somewhere in the src/scala/collection/random folder

note the scala.collection package - this thing makes the elems part of HashTrieSet visible. This is only solution i could think, which could run better than O(n) . Current version should have complexity O(ln(n)) as any of immutable.HashSet 's operation s.

Another warning - private structure of HashSet is not part of scala's standard library API, so it could change any version making this code erroneous (though it's didn't changed since 2.8)

Since size is O(1) on HashSet , and iterator is as lazy as possible, I think this solution would be relatively efficient:

implicit class RichHashSet[T](val h: HashSet[T]) extends AnyVal {
    def nextRandom: Option[T] = Some(h.size) collect {
        case size if size > 0 => h.iterator.drop(Random.nextInt(size)).next
    }
}

And if you're trying to get every ounce of efficiency you could use match here instead of the more concise Some/collect idiom used here.

You can look at the mutable HashSet implementation to see the size method. The iterator method defined there basically just calls iterator on FlatHashTable . The same basic efficiencies of these methods apply to immutable HashSet if that's what you're working with. As a comparison, you can see the toList implementation on HashSet is all the way up the type hierarchy at TraversableOnce and uses far more primitive elements which are probably less efficient and (of course) the entire collection must be iterated to generate the List . If you were going to convert the entire set to a Traversable collection, you should use Array or Vector which have constant-time lookup.

You might also note that there is nothing special about HashSet in the above methods, and you could enrich Set[T] instead, if you so chose (although there would be no guarantee that this would be as efficient on other Set implementations, of course).

As a side note, when implementing enriched classes for extension methods, you should always consider making an implicit, user-defined value class by extending AnyVal . You can read about some of the advantages and limitations in the docs , and on this answer .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM