简体   繁体   中英

What is the difference between when I run a spark application using spark-submit and java -cp?

Case 1:

spark-submit --class main.Test --master local[4] /path/Test.jar

SparkSession sparkSession = SparkSession.builder()
    .appName("Test")
    .getOrCreate();

Case 2:

java -cp /path/Test.jar com.main.Test

SparkSession sparkSession = SparkSession.builder()
    .appName("Test")
    .master("local[4]")
    .getOrCreate();

What is the difference between these two methods?

There are no major difference. The issues you may find are more deployment-related.

Case 1: you may need a super/uber jar to make sure everything is together.

Case 2: in some cases, like AWS EMR (at least until recently), you can't use:

SparkSession sparkSession = SparkSession.builder()
    .appName("Test")
    .master(<emr cluster's ip>)
    .getOrCreate();

Typically case 1 is development and case 2 is deployment. But there is no obligation.

Hope it helps...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM