简体   繁体   中英

adding jar driver to emr-6.7.0 spark

I'm trying to connect to aws redis cluster from an emr cluster, I uploaded the jar driver to s3 and used this bootstrap action to copy the jar file to the cluster nodes:

    aws s3 cp s3://sparkbcuket/spark-redis-2.3.0.jar /home/hadoop/spark-redis-2.3.0.jar

This is my connection test spark app:

import sys
from pyspark.sql import SparkSession

if __name__ == "__main__":
    spark = SparkSession.builder\
    .config("spark.redis.host", "testredis-0013.vb4vgr.00341.eu1.cache.amazonaws.com")\
    .config("spark.redis.port", "6379")\
    .appName("Redis_test").getOrCreate()

    df = spark.read.format("org.apache.spark.sql.redis").option("key.column", "key").option("keys.pattern","*").load()

    df.write.csv(path='s3://sparkbucket/',sep=',')
    
    spark.stop()

when runing the application using this spark-submit:

spark-submit --deploy-mode cluster --driver-class-path /home/hadoop/spark-redis-2.3.0.jar s3://sparkbucket/testredis.py

i get the following error and not sure what i did wrong:

ERROR Client: Application diagnostics message: User application exited with status 1 Exception in thread "main" org.apache.spark.SparkException: Application application_1658168513779_0001 finished with failed status

With similar test code, I successfully run by uploading the spark-redis jar in S3 and used --jars as arg as follows:

spark-submit --deploy-mode cluster --jars s3://<bucket/path>/spark-redis_2.12-3.1.0-SNAPSHOT-jar-with-dependencies.jar s3://<bucket/path>/redis_test.py

The detailed log for the run can be viewed in the Spark history server. This can be accessed in the EMR web console following this sequence of links:

Summary -> Spark history server -> application_xxx_xxx -> Executors -> (driver)stdout

You'll get NoSuchKey error as it will take some time for the log to be available, just reload.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM