When I use Shark/Spark SQL to process big data, Spark will alert Out Of Memory
. There is no use of tunnelling GC. I guess the raw data is too big to be processed.
My question is how I can estimate the memory to allocate for Spark, or when given a specific memory to Spark, the maximum data Spark can process?
If you would like to set memory, you may try this in your scala code as below,
val conf = new SparkConf() .setMaster("local") .setAppName("Wordcount") .set("spark.executor.memory", "4g") val sc = new SparkContext(conf)
You may visit for more tuning information. http://spark.apache.org/docs/latest/tuning.html#data-serialization And spark configuration for reference. http://spark.apache.org/docs/latest/configuration.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.