简体   繁体   English

Oozie上的Apache Spark:是否提供额外的配置文件?

[英]Apache Spark on Oozie: providing extra config file?

I'm building a Spark app (1.6.0 currently) to run on Cloudera with Oozie in Hue. 我正在构建一个Spark应用程序(当前为1.6.0),并与Hue的Oozie一起在Cloudera上运行。 We want to use Hue and Oozie, as the people who will be running these jobs will be more comfortable with a browser-based interface than hacking around in Oozie XML configs or firing off spark-submit on the command-line. 我们希望使用Hue和Oozie,因为将要运行这些作业的人将比基于Oozie XML配置或在命令行中触发spark-submit更加熟悉基于浏览器的界面。

I've figured out how to run a basic Oozie/Spark Action via Hue (3.10). 我已经弄清楚了如何通过Hue(3.10)运行基本的Oozie / Spark Action。 But we want to be able to provide various non-Spark parameters via a config file at runtime, as you would for a normal Scala app. 但是我们希望能够在运行时通过配置文件提供各种非Spark参数,就像普通的Scala应用一样。 I'm struggling to find a mechanism that (a) Oozie will accept, and (b) Spark will recognise for picking up the config params from the specified file at runtime. 我正在努力寻找一种机制,该机制(a)Oozie将接受,并且(b)Spark将在运行时识别从指定文件中获取配置参数的机制。

I've tried various permutations eg putting this as one of the Options in the Oozie Action "Properties" tab in Hue: 我尝试了各种排列,例如将其作为“色调”的Oozie动作“属性”选项卡中的选项之一:

options "-Dconfig.file=/my/file/location/fubar.conf"

But the Spark job fails to pick up the config, or just fails completely (no obvious error in logs). 但是Spark作业无法获取配置,或者只是完全失败(日志中没有明显错误)。

Running the Spark code in local mode (ie not on Cloudera) from the command-line using spark-submit seems to work: 使用spark-submit从命令行在本地模式下(即,不在Cloudera上)运行Spark代码似乎可行:

spark-submit --class com.example.Sparky --master local[*] \
--driver-java-options "-Dconfig.file=/my/file/location/fubar.conf" \
target/scala-2.11/spark-dummy_2.11-1.0.jar

So I guess I need to find out how to supply the equivalent runtime config to an Oozie/Spark Action on Cloudera. 所以我想我需要找出如何向Cloudera上的Oozie / Spark Action提供等效的运行时配置。

Anybody know the right way to do this? 有人知道正确的方法吗?

So it turns out you can specify the options as Java driver options for the Oozie Spark Action. 因此,事实证明您可以将这些选项指定为Oozie Spark Action的Java驱动程序选项。

You can edit the Spark Action to set various properties via the little cog symbol in the top right corner of the initial page of the form. 您可以编辑“火花操作”以通过表单初始页面右上角的小齿轮符号设置各种属性。

Click on the cogs to open the second page of the form, then select the "Properties" tab. 单击齿轮以打开表单的第二页,然后选择“属性”选项卡。

In "Options list", enter the same Java driver options as in the spark-submit example: 在“选项列表”中,输入与spark-submit示例相同的Java驱动程序选项:

--driver-java-options "-Dconfig.file=/my/file/location/fubar.conf"

So this allows you to pass properties into your Spark app that might otherwise be set via your app config file. 因此,这使您可以将属性传递到Spark应用程序中,否则可以通过应用程序配置文件进行设置。 For example, if you have a property "app.fubar.var1", you can pass this in directly via the Java driver options. 例如,如果您有一个属性“ app.fubar.var1”,则可以直接通过Java驱动程序选项将其传入。

--driver-java-options "-Dapp.fubar.var1=myvalue"

But I still cannot get my Spark app to recognise the location of my config file if I pass it into the Oozie Spark Action like this. 但是,如果我像这样将其传递给Oozie Spark Action,仍然无法使我的Spark应用程序识别配置文件的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM