简体   繁体   中英

Executor Memory vs Java Heap Size memory

My cluster size is 3 nodes having 8 GB RAM and 2 core each.I am increasing the executor memory in following way for spark :

//creating spark session
    val spark = SparkSession
      .builder()
      .appName(s"${this.getClass.getSimpleName}")
      .config("spark.sql.shuffle.partitions", "9")
      .config("spark.executor.memory", "3g")
      .config("spark.executor.cores", "1")
      .master("local[*]")
      .getOrCreate()

Thus 4 executor with 3gigs of RAM each will launch having one task per core.

The code i am executing here is as follows:

val seq2 = List((125,0),(125,125),(125,250),(125,375))

val urls = spark.sparkContext.parallelize(seq2).toDF()

val actual_data = urls.map(x => HTTPRequestParallel.ds(x.getInt(0).toString,x.getInt(1).toString,t0)).persist(StorageLevel.MEMORY_AND_DISK)

val dataframe = spark.read.option("header","true").json(actual_data)

When i am calling 4 web-api in parallel which is returning around 1 gigs of data per call which is getting serialized in one method,i am still getting java heap memory issue.

As i know api is synchronized call,so it will be fetching and storing incoming data somewhere.Where is that location,is it jvm heap memory of node or executor memory assigned?

Increase shuffle.partition to 1000 or more, It should resolve the issue.

Also you can try using spark.default.parallelism .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM