简体   繁体   中英

How Scala achieve performance improvement for Map and Set by using different Class based on size?

I am a newbie to Scala and I just found out that Scala has scala.collection.immutable.EmptySet , Set1 , Set2 , Set3 , Set4 and HashSet . Same in the case of Map . It is mentioned that this helps to improve performance. Does it improve performance by working with elements collection having a size less than 5 based on Index and greater 4 by Hashing? If so, is there any mathematical explanation of how collection size less than 5 is not great for Hashing?

by working with elements collection having a size less than 5 based on Index

No, there is no indexing. Let's look at the most important method for Set :

  1. EmptySet.contains(x) just returns false , no work to do at all.

  2. Set1(elem1).contains(elem) just needs to do a single comparison elem == elem1 , which a hash set would need to do as well after comparing hashes (because hashes of different values can be the same).

  3. Set2 , Set3 , and Set4 also just need (from 1 to 4) equality comparisons and || .

HashSet.contains is also an one-liner except that all the work is done by get0 and computeHash , which are quite complicated. So even in the best case it has to do more work.

Methods other than contains can be specialized for small sizes similarly. Note that there is nothing special about size 4, it's quite likely that Set5 , Set6 etc. would also be faster than HashSet ; but eventually they would become slower, and the point when they do isn't fixed. Besides, adding them means more code needs to be loaded, which makes performance slightly worse everywhere. So it just needs to stop somewhere, and 4 was picked.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM