简体   繁体   中英

Spark number of cores used

I have a very simple spark job which reads million movie ratings and tell the ratings and number of times its rated. The job is run on the spark cluster and its running fine.

Have couple of questions on the parameter that I use to run the job?

  1. I have 2 nodes runnings. Node-1 = 24GB RAM & 8 VCPU's. Node-2 = 8GB RAM & 2 VCPU's.

so totally I have 32GB RAM and 10 VCPU's.

spark-submit command.

spark-submit --master spark://hadoop-master:7077 --executor-memory 4g --num-executors 4 --executor-cores 4 /home/hduser/ratings-counter.py

When I run the above command, which cores spark uses, is it from node-1 or node-2 or does it randomly allocates?

2.If I don't use number of executors what is the default executors spark uses?

from pyspark import SparkConf, SparkContext
import collections


conf = SparkConf().setMaster("hadoop-master").setAppName("RatingsHistogram")
sc = SparkContext(conf = conf)

lines = sc.textFile("hdfs://hadoop-master:8020/user/hduser/gutenberg/ml-10M100K/ratings.dat")
ratings = lines.map(lambda x: x.split('::')[2])
result = ratings.countByValue()

sortedResults = collections.OrderedDict(sorted(result.items()))
for key, value in sortedResults.items():
    print("%s %i" % (key, value))

is it from node-1 or node-2 or does it randomly allocates?

It really depends on how many workers you have initialized. Since in your spark-submit cmd you have specified a total of 4 executors, each executor will allocate 4gb of memory and 4 cores from the Spark Worker's total memory and cores. One easy way to see in which node each executor was started is to check the Spark's Master UI (default port is 8080) and from there to select your running app. Then you can check the executors tab within the application's UI.

If I don't use number of executors what is the default executors spark uses?

Usually, it initializes one executor per worker instance, and uses all worker's resources.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM