How to estimate the memory needed for Shark/Spark SQL?

Question

When I use Shark/Spark SQL to process big data, Spark will alert Out Of Memory . There is no use of tunnelling GC. I guess the raw data is too big to be processed.

My question is how I can estimate the memory to allocate for Spark, or when given a specific memory to Spark, the maximum data Spark can process?

Answer 1

What's your data size?
Which mode do you use for your shark/spark SQL? Standalone, yarn, mesos? Try to use standalone mode for testing first.
What's your machine environment setting? VM? CPU? Memory?

If you would like to set memory, you may try this in your scala code as below,

 val conf = new SparkConf() .setMaster("local") .setAppName("Wordcount") .set("spark.executor.memory", "4g") val sc = new SparkContext(conf)

You may visit for more tuning information. http://spark.apache.org/docs/latest/tuning.html#data-serialization And spark configuration for reference. http://spark.apache.org/docs/latest/configuration.html

How to estimate the memory needed for Shark/Spark SQL?

Question

1 answers

solution1
0 2014-12-02 02:58:40

How to estimate the memory needed for Shark/Spark SQL?

Question

1 answers

solution1 0 2014-12-02 02:58:40

solution1
0 2014-12-02 02:58:40