[英]How to run a standalone jar from spark.
I am very new to spark, just learning so please bear with me if I talk like a novice. 我是一个刚起步的新人,只是学习而已,所以如果我像新手一样说话,请多多包涵。
I have a regular java jar which is self contained, 我有一个自包含的常规Java jar,
The function of this jar is to listen to a queue and process some messages. 这个jar的功能是侦听队列并处理一些消息。 Now the requirement is to read from the queue in a distributed fashion so I have a spark master and three slaves managed by Yarn.
现在的要求是以分布式方式从队列中读取数据,因此我有一个Spark主服务器和三个由Yarn管理的从属服务器。 When I ./spark-submit this jar file on the standalone master all works fine.
当我./spark-submit提交到独立主机上的这个jar文件时,一切正常。 When I switch to a cluster mode by setting Yarn as master in the commandline I get lots of errors of file not found at HDFS.
当我通过在命令行中将Yarn设置为master切换到群集模式时,会收到很多在HDFS上找不到的文件错误。 I read up on stack and saw that I have to mention SparkContext but however I see no use of it in my case.
我在堆栈上阅读,发现必须提及SparkContext,但是在我的情况下我看不到它的使用。
There is questions here: 这里有问题:
Do I still have to use the 我仍然需要使用
SparkConf conf = new SparkConf().setMaster("yarn-cluster").setAppName("TibcoMessageConsumer");
SparkContext sparkContext = new SparkContext(conf);
I dont see any usage of sparkContext
in my case. 在我的情况下,我看不到
sparkContext
任何用法。
Since you are using Yarn, copy the jar to hdfs and then you can reference that in spark-submit. 由于使用的是Yarn,请将jar复制到hdfs,然后可以在spark-submit中引用它。 If you want to use a local file system, you have to copy that jar in all the worker nodes [not recommended]
如果要使用本地文件系统,则必须在所有辅助节点中复制该jar [不推荐]
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode cluster \
myapp-jar
You can look at this link for more details 您可以查看此链接以获取更多详细信息
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.