简体   繁体   English

在 EMR 中运行 Jupyter notebook 时没有名为“pyspark”的模块

[英]No module named 'pyspark' when running Jupyter notebook inside EMR

I am (very) new to AWS and Spark in general, and I'm trying to run a notebook instance in Amazon EMR.总的来说,我对 AWS 和 Spark(非常)陌生,我正在尝试在 Amazon EMR 中运行笔记本实例。 When I try to import pyspark to start a session and load data from s3, I get the error No module named 'pyspark'.当我尝试导入 pyspark 以启动 session 并从 s3 加载数据时,我收到错误 No module named 'pyspark'。 The cluster I created had the Spark option filled, what am I doing wrong?我创建的集群填充了 Spark 选项,我做错了什么?

The only solution that worked for me was to change the notebook kernel to the PySpark kernel, then changing the bootstrap action to install packages (in python version3.6) that are not by default in the pyspark kernel: The only solution that worked for me was to change the notebook kernel to the PySpark kernel, then changing the bootstrap action to install packages (in python version3.6) that are not by default in the pyspark kernel:

#!/bin/bash
sudo python3.6 -m pip install numpy \
    matplotlib \
    pandas \
    seaborn \
    pyspark

Apparently by default it installs to python 2.7.16, so it outputs no error message but you can't import the modules because the spark env uses Python 2.7.16.显然默认情况下它安装到 python 2.7.16,因此它不会输出错误消息,但您无法导入模块,因为 spark env 使用 Python 2.7.16。

You can open jupyter lab notebook and select new spark notebook from there.您可以从那里打开 jupyter lab notebook 和 select new spark notebook。 This will initiate the spark context automatically for you.这将为您自动启动火花上下文。

在此处输入图像描述

Or you can open Jupyter notebook and load spark app by %%spark或者您可以打开 Jupyter notebook 并通过%%spark加载 spark 应用程序

在此处输入图像描述

You Could try using the findspark library.您可以尝试使用 findspark 库。 Could pip install findspark and below code in your jupyter. pip 能否在您的 jupyter 中安装 findspark 及以下代码。

import findspark
findspark.init()

%load_ext sparksql_magic
%config SparkSql.limit=200

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过 vscode jupyter 服务器运行的 Jupyter Notebook 出现 ModuleNotFoundError: No module named from pyspark on Amazon EMR - Jupyter Notebook running through vscode jupyter server getting ModuleNotFoundError: No module named from pyspark on Amazon EMR 在 EMR 中运行 Jupyter PySpark notebook,虽然已安装,但未找到模块 - Running Jupyter PySpark notebook in EMR, module not found, although it is installed ImportError:运行jupyter notebook时没有名为IPython.paths的模块? - ImportError: No module named IPython.paths when running jupyter notebook? EMR 上的 Jupyter 笔记本在代码运行时未打印 output Pyspark - Jupyter notebook on EMR not printing output while code is running Pyspark 安装了Anaconda3,jupyter笔记本电脑出现错误,没有名为“ pyspark”的模块 - Anaconda3 is installed, jupyter notebook errors out No module named 'pyspark' 无法访问 EMR 集群 jupyter notebook 中的 pyspark - Cannot access pyspark in EMR cluster jupyter notebook Jupyter pyspark:没有名为 pyspark 的模块 - Jupyter pyspark : no module named pyspark 没有名为 graphframes 的模块 Jupyter Notebook - No module named graphframes Jupyter Notebook Jupyter Notebook 中没有名为“graphviz”的模块 - No module named 'graphviz' in Jupyter Notebook Jupyter Notebook中没有名为Pandas的模块 - No module named Pandas in Jupyter Notebook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM