未找到Amazon EMR Pyspark模块

Question

I created an Amazon EMR cluster with Spark already on it. 我已经创建了一个包含Spark的Amazon EMR集群。 When I run pyspark from the terminal it goes into the pyspark terminal when I ssh into my cluster. 当我从终端运行pyspark时，当我进入我的集群时，它会进入pyspark终端。

I uploaded a file using scp, and when I try to run it with python FileName.py, I get an import error: 我使用scp上传了一个文件，当我尝试使用python FileName.py运行它时，出现导入错误：

from pyspark import SparkContext
ImportError: No module named pyspark

How do I fix this? 我该如何解决？

Answer 1

I add the following lines to ~/.bashrc for emr 4.3: 我为emr 4.3添加以下行到~/.bashrc ：

export SPARK_HOME=/usr/lib/spark
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.XXX-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

Here py4j-0.XXX-src.zip is the py4j file in your spark python library folder. 这里py4j-0.XXX-src.zip是你的spark python库文件夹中的py4j文件。 Search /usr/lib/spark/python/lib/ to find the exact version and replace the XXX with that version number. 搜索/usr/lib/spark/python/lib/以查找确切的版本并将XXX替换为该版本号。

Run source ~/.bashrc and you should be good. 运行source ~/.bashrc ，你应该很好。

Answer 2

You probably need to add the pyspark files to the path. 您可能需要将pyspark文件添加到路径中。 I typically use a function like the following. 我通常使用如下函数。

def configure_spark(spark_home=None, pyspark_python=None):
    spark_home = spark_home or "/path/to/default/spark/home"
    os.environ['SPARK_HOME'] = spark_home

    # Add the PySpark directories to the Python path:
    sys.path.insert(1, os.path.join(spark_home, 'python'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'pyspark'))
    sys.path.insert(1, os.path.join(spark_home, 'python', 'build'))

    # If PySpark isn't specified, use currently running Python binary:
    pyspark_python = pyspark_python or sys.executable
    os.environ['PYSPARK_PYTHON'] = pyspark_python

Then, you can call the function before importing pyspark: 然后，您可以在导入pyspark之前调用该函数：

configure_spark('/path/to/spark/home')
from pyspark import SparkContext

Spark home on an EMR node should be something like /home/hadoop/spark . EMR节点上的Spark home应该是/home/hadoop/spark 。 See https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 for more details. 有关详细信息，请参阅https://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 。

Answer 3

Try using findspark : Install via shell using pip install findspark . 尝试使用findspark ：使用pip install findspark通过shell pip install findspark 。

Sample code: 示例代码：

# Import package(s).
import findspark
findspark.init()

from pyspark import SparkContext
from pyspark.sql import SQLContext

Answer 4

您可以使用以下命令从命令行直接执行文件：

spark-submit FileName.py

未找到Amazon EMR Pyspark模块

问题描述

4 个解决方案

解决方案1
5 2016-02-22 04:39:22

解决方案2
4 2015-08-13 01:13:21

解决方案3
0 2018-08-11 13:48:09

解决方案4
-2 2017-03-20 16:45:18

未找到Amazon EMR Pyspark模块

问题描述

4 个解决方案

解决方案1 5 2016-02-22 04:39:22

解决方案2 4 2015-08-13 01:13:21

解决方案3 0 2018-08-11 13:48:09

解决方案4 -2 2017-03-20 16:45:18

解决方案1
5 2016-02-22 04:39:22

解决方案2
4 2015-08-13 01:13:21

解决方案3
0 2018-08-11 13:48:09

解决方案4
-2 2017-03-20 16:45:18