简体   繁体   中英

EMR conf spark-default settings

I am using configuration file according to guides Configure Spark to setup EMR configuration on AWS, for example, changing the spark.executor.extraClassPath is via the following settings:

{
     "Classification": "spark-defaults",
     "Properties": {
         "spark.executor.extraClassPath": "/home/hadoop/mongo-hadoop-spark.jar",
     }
}

It works prefect and do change spark.executor.extraClassPath on emr spark conf, but emr has some preset default paths in spark.executor.extraClassPath , so instead of overwriting the spark.executor.extraClassPath .I would like to know if there is a way to append the path and keep the default paths such as

{
     "Classification": "spark-defaults",
     "Properties": {
         "spark.executor.extraClassPath": "{$extraClassPath}:/home/hadoop/mongo-hadoop-spark.jar",
     }
}

You can specify it in your emr template as follows

Classification: spark-defaults
          ConfigurationProperties:
            spark.jars: Your jar location

Specifying full path for all additional jars while job sumit will work for you.

-- jars

This option Will submit these jars to all the executors and will not change default extra classpath

One more option I know but I only tried it with Yarn conf not sure about EMR though

./bin/spark-submit --class "SparkTest" --master local[*] --jars /fullpath/first.jar,/fullpath/second.jar /fullpath/your-program.jar

You can put "spark.jars" in spark-defaults.conf so even if you are using notebook this configuration will be used. Hope it will solve your problem

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM