简体   繁体   English

Azure HDInsight Jupyter 和 pyspark 不工作

[英]Azure HDInsight Jupyter and pyspark not working

I created a HDInsight cluster on azure with the following parameters:我使用以下参数在 azure 上创建了一个 HDInsight 集群:

Spark 2.4 (HDI 4.0)

And I try the tutorial of HDInsights for Apache Spark with PySpark Jupyter Notebook, and it works just fine.我用 PySpark Jupyter Notebook 尝试了Apache Spark的 HDInsights 教程,它工作得很好。 But ever since I re-run the notebook for the second time or start the new one, and run simple但是自从我第二次重新运行笔记本或启动新笔记本后,运行简单

from pyspark.sql import *

or other commands, they all end up with或其他命令,它们都以

The code failed because of a fatal error:
    Session 7 did not start up in 180 seconds..

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.

After this, I also tried pyspark with ssh. When I connected to the cluster through ssh and run在此之后,我还尝试了 pyspark 和 ssh。当我通过 ssh 连接到集群并运行

$ pyspark

It shows the following information它显示以下信息

SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

and stuck right there.并卡在那里。

I was wondering If I missed any operation?我想知道我是否错过了任何手术? or it is a bug or something.或者它是一个错误或什么的。 And How could I fix this problem?我该如何解决这个问题?

As per my observation, you will get this error message when you have issue with “YARN” services example: YARN service is stopped.根据我的观察,当您遇到“YARN”服务示例问题时,您会收到此错误消息:YARN 服务已停止。

ERROR: First I had stopped “YARN” services.错误:首先我停止了“YARN”服务。

在此处输入图像描述

Now I started using Jupyter notebook and when I run the same query, experiencing the same error message as yours.现在我开始使用 Jupyter 笔记本,当我运行相同的查询时,遇到与您相同的错误消息。

在此处输入图像描述

WALKTHROUGH: ERROR MESSAGE演练:错误消息

在此处输入图像描述

SUCCESS: All Ambari services are running without any issue.成功:所有 Ambari 服务都在正常运行。

在此处输入图像描述

To successfully run “Jupyter Notebook” queries, make sure all the services are running without any issue.要成功运行“Jupyter Notebook”查询,请确保所有服务都在正常运行。

在此处输入图像描述

WALKTHROUGH: SUCCESS MESSAGE演练:成功消息

在此处输入图像描述

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++

Here are the steps to create a Jupyter notebook and run queries on Azure HDInsight Spark cluster:以下是创建 Jupyter notebook 并在 Azure HDInsight Spark 集群上运行查询的步骤:

Go to Azure Portal => From Cluster Dashboards => Select Jupyter Notebook => Create Pyspark notebook => And execute the queries as shown. Go 到Azure 门户=> 从集群仪表板=> Select Jupyter 笔记本=> 创建Pyspark笔记本 => 并执行查询,如图所示。

在此处输入图像描述

You can use interactive Apache for running Pyspark (Python) queries:您可以使用交互式 Apache 运行 Pyspark (Python) 查询:

在此处输入图像描述

Reference: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell参考: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM