简体繁体 English

在Ubuntu 14.04上的Yarn-Client模式下在Spark上的Zeppelin中加载外部依赖项

[英]Loading external dependencies in Zeppelin on Spark in Yarn-Client mode on Ubuntu 14.04

原文 2016-09-08 15:18:03 5 1 hadoop/ apache-spark/ spark-streaming/ yarn/ apache-zeppelin

Dear community! 亲爱的社区！ Before I describe the problem, here's a short description of the software in use (where the latter two are running in a small cluster of three nodes, each of them using Ubuntu 14.04 ): 在我描述问题之前，这里是所使用软件的简短描述（其中后两个在一个由三个节点组成的小型群集中运行，每个节点都使用Ubuntu 14.04 ）：

Zeppelin 0.6.1 齐柏林飞艇0.6.1
Spark 2.0.0 along with Scala 2.11.8 Spark 2.0.0和Scala 2.11.8
Hadoop 2.7.3 Hadoop 2.7.3

The situation is as follows: In order to use the TwitterUtils class in a Spark Streaming application written in a Zeppelin note, I need to include org.apache.spark.streaming.twitter._ from Maven ( org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview ). 这种情况如下：为了使用TwitterUtils类写成齐柏林音符星火流应用程序，我需要包括org.apache.spark.streaming.twitter._从Maven的（org.apache.bahir：火花stream-twitter_2.11：2.0.0-preview ）。 What I learned so far is that there are a couple of options to make external dependencies available in Zeppelin: 到目前为止，我了解到，Zeppelin中提供了两个选项来使外部依赖项可用：

Export the SPARK_SUBMIT_OPTIONS variable in conf/zeppelin-env.sh and set --jars (in my case --jars hdfs://admdsmaster:54310/global/jars/spark-streaming-twitter_2.11-2.0.0-preview.jar (path pointing to local file system was tested as well)). 在conf / zeppelin-env.sh中导出SPARK_SUBMIT_OPTIONS变量并设置--jars （在我的情况下为--jars hdfs：// admdsmaster：54310 / global / jars / spark-streaming-twitter_2.11-2.0.0-preview。 jar （也测试了指向本地文件系统的路径）。
Export SPARK_SUBMIT_OPTIONS and set --packages (in my case --packages org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview ). 导出SPARK_SUBMIT_OPTIONS并设置--packages （在我的情况下为--packages org.apache.bahir：spark-streaming-twitter_2.11：2.0.0-preview ）。
Set spark.jars or spark.jars.packages in conf/spark-defaults.conf with the values mentioned above. 使用上述值在conf / spark-defaults.conf中设置spark.jars或spark.jars.packages 。
Use the %dep interpreter in Zeppelin itself like so: z.load("org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview") . 像这样使用Zeppelin本身中的％dep解释器： z.load（“ org.apache.bahir：spark-streaming-twitter_2.11：2.0.0-preview”） 。 This is deprecated, though. 但是，不建议使用此方法。
Use sc.addJar() in the Zeppelin note to manually add a .jar file. 使用Zeppelin注释中的sc.addJar（）手动添加.jar文件。

After having tried all of the above -- and almost arbitrary combinations and variations thereof -- the problem is that I still can't import the TwitterUtils class from within a Zeppelin note: 在尝试了以上所有方法以及几乎任意组合和变体之后，问题是我仍然无法从Zeppelin笔记中导入TwitterUtils类：

Class import failing in Zeppelin note. Zeppelin笔记中的类导入失败。

What can be seen from the picture as well is the output of sc.listJars() which shows that the .jar file was actually included. 从图片中还可以看到sc.listJars（）的输出，该文件显示实际上包含了.jar文件。 Nonetheless, the class import fails. 但是，类导入失败。

My first thought was that the problem occurs because Spark is running in yarn-client mode, so I started the Spark shell in yarn-client mode as well and tried to import the TwitterUtils class from there -- which worked: 我的第一个想法是发生问题是因为Spark在yarn-client模式下运行，所以我也以yarn-client模式启动了Spark shell，并尝试从那里导入TwitterUtils类-该方法起作用：

Class import working from Spark shell. 从Spark Shell进行类导入。

In order to find out what's going on, I searched the log files of Zeppelin, Spark and YARN, but couldn't find any error messages to point me to the cause of the problem. 为了找出正在发生的情况，我搜索了Zeppelin，Spark和YARN的日志文件，但找不到任何错误消息来指出问题的原因。

Long story short : Although the jar file was included in Zeppelin (as proven by sc.listJars() ) and although the class import works from the spark-shell in yarn-client mode, I just can't get the import to work from within my Zeppelin note. 长话短说 ：尽管Zeppelin中包含jar文件（已通过sc.listJars（）证明），并且虽然类导入是在yarn-client模式下从spark-shell进行的，但我还是无法从中进行导入在我的齐柏林飞艇笔记中

Long story even shorter : I'd really appreciate your ideas on how to solve this problem! 长话短说 ：非常感谢您关于如何解决此问题的想法！

Thanks in advance for your time and effort. 在此先感谢您的时间和精力。

PS: I'm sorry for the fact that I could not upload the images to this post directly -- it says that I need at least 10 reputation points which I do not have as this is my first ever post here. PS：很抱歉我不能直接将图像上传到此帖中-它说我需要至少10个我没有的声誉点，因为这是我在这里的第一篇帖子。

1 个解决方案

Adding the dependency from the interpreter tab as proposed by @eliasah actually did the trick -- thank you very much! 按照@eliasah的建议从解释器选项卡中添加依赖项实际上可以解决问题，非常感谢！

For the fellows out there who might be running into the same problem, I'm going to describe the solution very shortly and add a picture of how a call to sc.listJars() should actually look like (compared to the picture in the original question). 对于可能遇到相同问题的其他人，我将在很短的时间内描述解决方案，并添加一张图片，以显示对sc.listJars（）的调用实际上应该是什么样子的（与原始图片相比）题）。

Head over to Zeppelin's interpreter tab and scroll down or search for the spark interpreter, then hit edit . 转到Zeppelin的解释器选项卡，向下滚动或搜索spark解释器，然后点击编辑。 At the very bottom of the available settings there is a Dependencies section. 在可用设置的最底部，有一个“ 依赖关系”部分。 Add your dependency here (by specifying the Maven coordinates, for example, in my case org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview ) and save the settings. 在此处添加您的依赖项（例如，在我的情况下，通过指定Maven坐标org.apache.bahir：spark-streaming-twitter_2.11：2.0.0-preview ）并保存设置。 After restarting your interpreter, the dependency should be available. 重新启动解释器后，依赖项应该可用。

Here's what a call to sc.listJars() looked like in my case after having executed the steps described above: 在执行上述步骤后，对我来说，这是对sc.listJars（）的调用：

If you compare this picture to the first one in the original question, you'll notice that the list now contains a lot more entries. 如果将这张图片与原始问题中的第一张图片进行比较，您会发现列表现在包含更多条目。 I'm still wondering, though, why the class import did not work when only the .jar file containing it was present. 但是，我仍然想知道为什么当仅存在包含.jar文件的类导入时，类导入不起作用。 Anyway, problem solved thanks to @eliasah -- thanks again, you deserve a cookie! 无论如何，感谢@eliasah解决了问题-再次感谢您，您应该得到一个cookie！ -- and I hope that this short description will help others as well. -希望这个简短的说明也能对其他人有所帮助。