简体   繁体   中英

Apache Zeppelin running on Spark Cluster and YARN

I have created and ran a %pyspark program in Apache Zeppelin running on a Spark Cluster with yarn-client. The program is reading a file in a Dataframe from HDFS and does a simple groupby command and prints the output successfully. I am using Zeppellin version 0.6.2 and Spark 2.0.0 .

I can see the job running in YARN(see application_1480590511892_0007): 在此处输入图片说明

But when I check the Spark UI at the same time there is nothing at all for this job:

在此处输入图片说明

Question 1 : Shouldn't this job appear in both of these windows?

Also, the completed applications in the SparkUI image just above, were Zeppelin jobs with the %python interpreter simply initializing a SparkSession and stopping it:

1st Zeppelin block:

%python
from pyspark.sql import SparkSession
from pyspark.sql import Row
import collections

spark = SparkSession.builder.appName("SparkSQL").getOrCreate()

2nd Zeppelin block:

 %python
 spark.stop()

Question 2: This job in turn, has not appeared in the YARN UI. Is it the case that whenever a job appears in the SparkUI means that it is running with Spark Resource manager?

Any insights for these questions are highly appreciated.

Zeppelin runs a continuous Spark application once the interpreter is first used. All the paragraphs will run in this one application. In your second paragraph you are stopping the SparkSession (spark.stop), so that would kill the application that was created when the interpreter was first used. So you can just see the jobs under the Completed Applications section. If you remove the spark.stop, you should see the job listed under Running Applications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM