简体   繁体   中英

Error while running Zeppelin paragraphs in Spark on Linux cluster in Azure HdInsight

I have been following this tutorial in order to set up Zeppelin on a Spark cluster (version 1.5.2) in HDInsight, on Linux. Everything worked fine, I have managed to successfully connect to the Zeppelin notebook through the SSH tunnel. However, when I try to run any kind of paragraph, the first time I get the following error:

java.io.IOException: No FileSystem for scheme: wasb

After getting this error, if I try to rerun the paragraph, I get another error:

java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method)

These errors occur regardless of the code I enter, even if there is no reference to the hdfs. What I'm saying is that I get the "No FileSystem" error even for a trivial scala expression, such as parallelize.

Is there a missing configuration step?

I am download the tar ball that the script that you pointed to as I type. But want I am guessing is that your zeppelin install and spark install are not complete to work with wasb. In order to get spark to work with wasb you need to add some jars to the Class path. To do this you need to add something like this to your spark-defaults.conf (the paths might be different in HDInsights, this is from HDP on IaaS)

spark.driver.extraClassPath /usr/hdp/2.3.0.0-2557/hadoop/lib/azure-storage-2.2.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/microsoft-windowsazure-storage-sdk-0.6.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/hadoop-azure-2.7.1.2.3.0.0-2557.jar
spark.executor.extraClassPath /usr/hdp/2.3.0.0-2557/hadoop/lib/azure-storage-2.2.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/microsoft-windowsazure-storage-sdk-0.6.0.jar:/usr/hdp/2.3.0.0-2557/hadoop/hadoop-azure-2.7.1.2.3.0.0-2557.jar

Once you have spark working with wasb, or next step is make those sames jar in zeppelin class path. A good way to test your setup is make a notebook that prints your env vars and class path.

sys.env.foreach(println(_))

val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)

Also looking at the install script, it trying to pull the zeppelin jar from wasb, you might want to change that config to somewhere else while you try some of these changes out. (zeppelin.sh)

export SPARK_YARN_JAR=wasb:///apps/zeppelin/zeppelin-spark-0.5.5-SNAPSHOT.jar

I hope this helps, if you are still have problems I have some other ideas, but would start with these first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM