简体   繁体   English

如何从Spark运行独立的jar。

[英]How to run a standalone jar from spark.

I am very new to spark, just learning so please bear with me if I talk like a novice. 我是一个刚起步的新人,只是学习而已,所以如果我像新手一样说话,请多多包涵。

I have a regular java jar which is self contained, 我有一个自包含的常规Java jar,

The function of this jar is to listen to a queue and process some messages. 这个jar的功能是侦听队列并处理一些消息。 Now the requirement is to read from the queue in a distributed fashion so I have a spark master and three slaves managed by Yarn. 现在的要求是以分布式方式从队列中读取数据,因此我有一个Spark主服务器和三个由Yarn管理的从属服务器。 When I ./spark-submit this jar file on the standalone master all works fine. 当我./spark-submit提交到独立主机上的这个jar文件时,一切正常。 When I switch to a cluster mode by setting Yarn as master in the commandline I get lots of errors of file not found at HDFS. 当我通过在命令行中将Yarn设置为master切换到群集模式时,会收到很多在HDFS上找不到的文件错误。 I read up on stack and saw that I have to mention SparkContext but however I see no use of it in my case. 我在堆栈上阅读,发现必须提及SparkContext,但是在我的情况下我看不到它的使用。

There is questions here: 这里有问题:

Do I still have to use the 我仍然需要使用

SparkConf conf = new SparkConf().setMaster("yarn-cluster").setAppName("TibcoMessageConsumer");
        SparkContext sparkContext = new SparkContext(conf);

I dont see any usage of sparkContext in my case. 在我的情况下,我看不到sparkContext任何用法。

Since you are using Yarn, copy the jar to hdfs and then you can reference that in spark-submit. 由于使用的是Yarn,请将jar复制到hdfs,然后可以在spark-submit中引用它。 If you want to use a local file system, you have to copy that jar in all the worker nodes [not recommended] 如果要使用本地文件系统,则必须在所有辅助节点中复制该jar [不推荐]

./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode cluster \
 myapp-jar   

You can look at this link for more details 您可以查看此链接以获取更多详细信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM