简体   繁体   English

如何在其他集群上将作业提交给纱线?

[英]how to spark submit job to yarn on other cluster?

I have a docker container with spark installed and i am trying to submit job to yarn on other cluster using marathon .我有一个安装了 spark 的 docker 容器,我正在尝试使用 marathon 将作业提交到其他集群上的纱线。 The docker container has the exported values of yarn and hadoop conf dir, the yarn file also contains the correct address of the emr master ip , but i am not sure from where its taking as localhost? docker 容器具有 yarn 和 hadoop conf dir 的导出值,yarn 文件还包含 emr master ip 的正确地址,但我不确定它从哪里作为本地主机?

ENV YARN_CONF_DIR="/opt/yarn-site.xml"
ENV HADOOP_CONF_DIR="/opt/spark-2.2.0-bin-hadoop2.6"

Yarn.xml纱线文件

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>xx.xxx.x.xx</value>
  </property>

Command:命令:

  "cmd": "/opt/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --verbose \\\n --name emr_external_mpv_streaming \\\n --deploy-mode client \\\n --master yarn\\\n --conf spark.executor.instances=4 \\\n --conf spark.executor.cores=1 \\\n --conf spark.executor.memory=1g \\\n --conf spark.driver.memory=1g \\\n --conf spark.cores.max=4 \\\n --conf spark.executorEnv.EXT_WH_HOST=$EXT_WH_HOST \\\n --conf spark.executorEnv.EXT_WH_PASSWORD=$EXT_WH_PASSWORD \\\n --conf spark.executorEnv.KAFKA_BROKER_LIST=$_KAFKA_BROKER_LIST \\\n --conf spark.executorEnv.SCHEMA_REGISTRY_URL=$SCHEMA_REGISTRY_URL \\\n --conf spark.executorEnv.AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \\\n --conf spark.executorEnv.AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \\\n --conf spark.executorEnv.STAGING_S3_BUCKET=$STAGING_S3_BUCKET \\\n --conf spark.executorEnv.KAFKA_GROUP_ID=$KAFKA_GROUP_ID \\\n --conf spark.executorEnv.MAX_RATE=$MAX_RATE \\\n --conf spark.executorEnv.KAFKA_MAX_POLL_MS=$KAFKA_MAX_POLL_MS \\\n --conf spark.executorEnv.KAFKA_MAX_POLL_RECORDS=$KAFKA_MAX_POLL_RECORDS \\\n --class com.ticketnetwork.edwstream.external.MapPageView \\\n /opt/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar",

I tried specifying --deploy-mode cluster \\\\n --master yarn\\\\n -- same error我尝试指定 --deploy-mode cluster \\\\n --master yarn\\\\n -- 同样的错误

Error:错误:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/09/10 20:41:24 INFO SparkContext: Running Spark version 2.2.0
18/09/10 20:41:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/10 20:41:25 INFO SparkContext: Submitted application: edw-stream-ext-mpv-emr-prod
18/09/10 20:41:25 INFO SecurityManager: Changing view acls to: root
18/09/10 20:41:25 INFO SecurityManager: Changing modify acls to: root
18/09/10 20:41:25 INFO SecurityManager: Changing view acls groups to: 
18/09/10 20:41:25 INFO SecurityManager: Changing modify acls groups to: 
18/09/10 20:41:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
18/09/10 20:41:25 INFO Utils: Successfully started service 'sparkDriver' on port 35868.
18/09/10 20:41:25 INFO SparkEnv: Registering MapOutputTracker
18/09/10 20:41:25 INFO SparkEnv: Registering BlockManagerMaster
18/09/10 20:41:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/09/10 20:41:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/09/10 20:41:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5526b967-2be9-44bf-a86f-79ef72f2ac0f
18/09/10 20:41:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/09/10 20:41:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/09/10 20:41:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/09/10 20:41:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.150.4.45:4040
18/09/10 20:41:26 INFO SparkContext: Added JAR file:/opt/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar at spark://10.150.4.45:35868/jars/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar with timestamp 1536612086416
18/09/10 20:41:26 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/09/10 20:41:27 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/09/10 20:41:28 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/09/10 20:41:29 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

0.0.0.0 is the default hostname property, and 8032 is the default port number. 0.0.0.0是默认主机名属性,8032 是默认端口号。

One reason you're getting defaults would be neither of Hadoop environment variables are correctly set.您获得默认值的一个原因是 Hadoop 环境变量都没有正确设置。 Your HADOOP_CONF_DIR need to be Spark's (or Hadoop's) conf folder, not the base folder from the Spark extraction.您的HADOOP_CONF_DIR需要是 Spark(或 Hadoop)的conf文件夹,而不是 Spark 提取的基本文件夹。 This directory must contain core-site.xml , yarn-site.xml , hdfs-site.xml , and hive-site.xml if using HiveContext如果使用 HiveContext,此目录必须包含core-site.xmlyarn-site.xmlhdfs-site.xmlhive-site.xml

Then if yarn-site.xml is in the above location, you don't need YARN_CONF_DIR , but if you do set it, it needs to be an actual directory, not directly to the file.那么如果yarn-site.xml在上面的位置,你就不需要YARN_CONF_DIR ,但是如果你设置了它,它需要是一个实际的目录,而不是直接到文件中。

Additionally, you'll probably need to set more than just one hostname.此外,您可能需要设置不止一个主机名。 For example, a production grade YARN cluster would have two ResourceManagers for fault tolerance.例如,生产级 YARN 集群将有两个用于容错的 ResourceManager。 Additionally, maybe some Kerberos keytabs and principals would need set if you had that enabled.此外,如果您启用了某些 Kerberos 密钥表和主体,则可能需要设置它。

If you already have Mesos/Marathon, though, I'm not sure why you'd want to use YARN但是,如果您已经拥有 Mesos/Marathon,我不确定您为什么要使用 YARN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM