Executor Memory vs Java Heap Size memory

Question

My cluster size is 3 nodes having 8 GB RAM and 2 core each.I am increasing the executor memory in following way for spark :

//creating spark session
    val spark = SparkSession
      .builder()
      .appName(s"${this.getClass.getSimpleName}")
      .config("spark.sql.shuffle.partitions", "9")
      .config("spark.executor.memory", "3g")
      .config("spark.executor.cores", "1")
      .master("local[*]")
      .getOrCreate()

Thus 4 executor with 3gigs of RAM each will launch having one task per core.

The code i am executing here is as follows:

val seq2 = List((125,0),(125,125),(125,250),(125,375))

val urls = spark.sparkContext.parallelize(seq2).toDF()

val actual_data = urls.map(x => HTTPRequestParallel.ds(x.getInt(0).toString,x.getInt(1).toString,t0)).persist(StorageLevel.MEMORY_AND_DISK)

val dataframe = spark.read.option("header","true").json(actual_data)

When i am calling 4 web-api in parallel which is returning around 1 gigs of data per call which is getting serialized in one method,i am still getting java heap memory issue.

As i know api is synchronized call,so it will be fetching and storing incoming data somewhere.Where is that location,is it jvm heap memory of node or executor memory assigned?

Answer 1

Increase shuffle.partition to 1000 or more, It should resolve the issue.

Also you can try using spark.default.parallelism .

Executor Memory vs Java Heap Size memory

Question

1 answers

solution1
0 2018-10-04 10:02:43

Executor Memory vs Java Heap Size memory

Question

1 answers

solution1 0 2018-10-04 10:02:43

solution1
0 2018-10-04 10:02:43