简体   繁体   中英

How to estimate the memory needed for Shark/Spark SQL?

When I use Shark/Spark SQL to process big data, Spark will alert Out Of Memory . There is no use of tunnelling GC. I guess the raw data is too big to be processed.

My question is how I can estimate the memory to allocate for Spark, or when given a specific memory to Spark, the maximum data Spark can process?

  1. What's your data size?
  2. Which mode do you use for your shark/spark SQL? Standalone, yarn, mesos? Try to use standalone mode for testing first.
  3. What's your machine environment setting? VM? CPU? Memory?
  4. If you would like to set memory, you may try this in your scala code as below,

     val conf = new SparkConf() .setMaster("local") .setAppName("Wordcount") .set("spark.executor.memory", "4g") val sc = new SparkContext(conf) 

You may visit for more tuning information. http://spark.apache.org/docs/latest/tuning.html#data-serialization And spark configuration for reference. http://spark.apache.org/docs/latest/configuration.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM