如何从Spark运行独立的jar。

Question

I am very new to spark, just learning so please bear with me if I talk like a novice. 我是一个刚起步的新人，只是学习而已，所以如果我像新手一样说话，请多多包涵。

I have a regular java jar which is self contained, 我有一个自包含的常规Java jar，

The function of this jar is to listen to a queue and process some messages. 这个jar的功能是侦听队列并处理一些消息。 Now the requirement is to read from the queue in a distributed fashion so I have a spark master and three slaves managed by Yarn. 现在的要求是以分布式方式从队列中读取数据，因此我有一个Spark主服务器和三个由Yarn管理的从属服务器。 When I ./spark-submit this jar file on the standalone master all works fine. 当我./spark-submit提交到独立主机上的这个jar文件时，一切正常。 When I switch to a cluster mode by setting Yarn as master in the commandline I get lots of errors of file not found at HDFS. 当我通过在命令行中将Yarn设置为master切换到群集模式时，会收到很多在HDFS上找不到的文件错误。 I read up on stack and saw that I have to mention SparkContext but however I see no use of it in my case. 我在堆栈上阅读，发现必须提及SparkContext，但是在我的情况下我看不到它的使用。

There is questions here: 这里有问题：

Do I still have to use the 我仍然需要使用

SparkConf conf = new SparkConf().setMaster("yarn-cluster").setAppName("TibcoMessageConsumer");
        SparkContext sparkContext = new SparkContext(conf);

I dont see any usage of sparkContext in my case. 在我的情况下，我看不到sparkContext任何用法。

Answer 1

Since you are using Yarn, copy the jar to hdfs and then you can reference that in spark-submit. 由于使用的是Yarn，请将jar复制到hdfs，然后可以在spark-submit中引用它。 If you want to use a local file system, you have to copy that jar in all the worker nodes [not recommended] 如果要使用本地文件系统，则必须在所有辅助节点中复制该jar [不推荐]

./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode cluster \
 myapp-jar

You can look at this link for more details 您可以查看此链接以获取更多详细信息

如何从Spark运行独立的jar。

问题描述

1 个解决方案

解决方案1
0 2017-07-13 08:58:26

如何从Spark运行独立的jar。

问题描述

1 个解决方案

解决方案1 0 2017-07-13 08:58:26

解决方案1
0 2017-07-13 08:58:26