简体   繁体   English

在Azure HdInsight的Linux群集上的Spark中运行Zeppelin段落时出错

[英]Error while running Zeppelin paragraphs in Spark on Linux cluster in Azure HdInsight

I have been following this tutorial in order to set up Zeppelin on a Spark cluster (version 1.5.2) in HDInsight, on Linux. 我一直在按照本教程进行操作 ,以便在Linux上的HDInsight的Spark群集(版本1.5.2)上设置Zeppelin。 Everything worked fine, I have managed to successfully connect to the Zeppelin notebook through the SSH tunnel. 一切正常,我设法通过SSH隧道成功连接到Zeppelin笔记本。 However, when I try to run any kind of paragraph, the first time I get the following error: 但是,当我尝试运行任何类型的段落时,第一次出现以下错误:

java.io.IOException: No FileSystem for scheme: wasb java.io.IOException:方案:wasb没有文件系统

After getting this error, if I try to rerun the paragraph, I get another error: 收到此错误后,如果我尝试重新运行该段落,则会收到另一个错误:

java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) java.net.SocketException:java.net.SocketOutputStream.socketWrite0(本地方法)处管道损坏

These errors occur regardless of the code I enter, even if there is no reference to the hdfs. 不管我输入什么代码,都会发生这些错误,即使没有引用hdfs。 What I'm saying is that I get the "No FileSystem" error even for a trivial scala expression, such as parallelize. 我的意思是,即使对于琐碎的scala表达式(例如并行化),我也会收到“ No FileSystem”错误。

Is there a missing configuration step? 是否缺少配置步骤?

I am download the tar ball that the script that you pointed to as I type. 我正在下载您键入的脚本所指向的tar球。 But want I am guessing is that your zeppelin install and spark install are not complete to work with wasb. 但是我想猜测的是,您的zeppelin安装和spark安装无法完全与wasb一起使用。 In order to get spark to work with wasb you need to add some jars to the Class path. 为了使火花与wasb配合使用,您需要在Class路径中添加一些jar。 To do this you need to add something like this to your spark-defaults.conf (the paths might be different in HDInsights, this is from HDP on IaaS) 为此,您需要在spark-defaults.conf中添加类似的内容(HDInsights中的路径可能不同,这是来自IaaS上的HDP)

spark.driver.extraClassPath /usr/hdp/2.3.0.0-2557/hadoop/lib/azure-storage-2.2.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/microsoft-windowsazure-storage-sdk-0.6.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/hadoop-azure-2.7.1.2.3.0.0-2557.jar
spark.executor.extraClassPath /usr/hdp/2.3.0.0-2557/hadoop/lib/azure-storage-2.2.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/microsoft-windowsazure-storage-sdk-0.6.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/hadoop-azure-2.7.1.2.3.0.0-2557.jar

Once you have spark working with wasb, or next step is make those sames jar in zeppelin class path. 一旦有了与wasb合作的火花,或者下一步是在齐柏林飞艇类路径中创建相同的jar。 A good way to test your setup is make a notebook that prints your env vars and class path. 测试设置的一种好方法是制作一个笔记本,该笔记本可以打印环境变量和类路径。

sys.env.foreach(println(_))

val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)

Also looking at the install script, it trying to pull the zeppelin jar from wasb, you might want to change that config to somewhere else while you try some of these changes out. 还要查看安装脚本,它试图从wasb中拉出齐柏林飞艇的罐子,您可能想在尝试某些更改时将其配置更改为其他位置。 (zeppelin.sh) (zeppelin.sh)

export SPARK_YARN_JAR=wasb:///apps/zeppelin/zeppelin-spark-0.5.5-SNAPSHOT.jar

I hope this helps, if you are still have problems I have some other ideas, but would start with these first. 我希望这会有所帮助,如果您仍然有问题,我还有其他一些想法,但是首先要从这些想法开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM