简体繁体中英

How Scala achieve performance improvement for Map and Set by using different Class based on size?

原文 2019-03-18 00:00:53 8 1 scala/ scala-collections

I am a newbie to Scala and I just found out that Scala has scala.collection.immutable.EmptySet , Set1 , Set2 , Set3 , Set4 and HashSet . Same in the case of Map . It is mentioned that this helps to improve performance. Does it improve performance by working with elements collection having a size less than 5 based on Index and greater 4 by Hashing? If so, is there any mathematical explanation of how collection size less than 5 is not great for Hashing?

1 answers

by working with elements collection having a size less than 5 based on Index

No, there is no indexing. Let's look at the most important method for Set :

EmptySet.contains(x) just returns false , no work to do at all.
Set1(elem1).contains(elem) just needs to do a single comparison elem == elem1 , which a hash set would need to do as well after comparing hashes (because hashes of different values can be the same).
Set2 , Set3 , and Set4 also just need (from 1 to 4) equality comparisons and || .

HashSet.contains is also an one-liner except that all the work is done by get0 and computeHash , which are quite complicated. So even in the best case it has to do more work.

Methods other than contains can be specialized for small sizes similarly. Note that there is nothing special about size 4, it's quite likely that Set5 , Set6 etc. would also be faster than HashSet ; but eventually they would become slower, and the point when they do isn't fixed. Besides, adding them means more code needs to be loaded, which makes performance slightly worse everywhere. So it just needs to stop somewhere, and 4 was picked.

Performance Improvement in scala dataframe operations

improvement of a snippet of scala/spark code to improve performance

Performance Difference Using Update Operation on a Mutable Map in Scala with a Large Size Data

How is Scala Case Class different from immutable map

In Scala using variables in a map reduces the performance?

How to create map for each line based on the column using scala?

How to set value in scala Map?

How to Map with a case class in Scala

How to map multiple types in a model class using scala?

How to map a query result to case class using Anorm in scala

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Performance Improvement in scala dataframe operations improvement of a snippet of scala/spark code to improve performance Performance Difference Using Update Operation on a Mutable Map in Scala with a Large Size Data How is Scala Case Class different from immutable map In Scala using variables in a map reduces the performance? How to create map for each line based on the column using scala? How to set value in scala Map? How to Map with a case class in Scala How to map multiple types in a model class using scala? How to map a query result to case class using Anorm in scala

Related Tags

How Scala achieve performance improvement for Map and Set by using different Class based on size?

Question

1 answers

solution1 1 2019-03-18 06:19:05

solution1
1 2019-03-18 06:19:05