简体   繁体   English

Apache Spark:如何在Python 3中使用pyspark

[英]Apache Spark: How to use pyspark with Python 3

I built Spark 1.4 from the GH development master, and the build went through fine. 我从GH开发大师那里构建了Spark 1.4,并且构建很顺利。 But when I do a bin/pyspark I get the Python 2.7.9 version. 但是当我做bin/pyspark我得到了Python 2.7.9版本。 How can I change this? 我怎么能改变这个?

Just set the environment variable: 只需设置环境变量:

export PYSPARK_PYTHON=python3

in case you want this to be a permanent change add this line to pyspark script. 如果您希望将其作为永久性更改,请将此行添加到pyspark脚本中。

PYSPARK_PYTHON=python3 
./bin/pyspark

If you want to run in in IPython Notebook, write: 如果你想在IPython Notebook中运行,请写:

PYSPARK_PYTHON=python3 
PYSPARK_DRIVER_PYTHON=ipython 
PYSPARK_DRIVER_PYTHON_OPTS="notebook" 
./bin/pyspark

If python3 is not accessible, you need to pass path to it instead. 如果无法访问python3则需要将路径传递给它。

Bear in mind that the current documentation (as of 1.4.1) has outdate instructions. 请记住, 当前的文档(从1.4.1开始)已经过时了。 Fortunately, it has been patched . 幸运的是, 它已被修补

1,edit profile : vim ~/.profile 1,编辑个人资料: vim ~/.profile

2,add the code into the file: export PYSPARK_PYTHON=python3 2,将代码添加到文件中: export PYSPARK_PYTHON=python3

3, execute command : source ~/.profile 3,执行命令: source ~/.profile

4, ./bin/pyspark 4,。/ ./bin/pyspark

Have a look into the file. 看看文件。 The shebang line is probably pointed to the 'env' binary which searches the path for the first compatible executable. shebang行可能指向'env'二进制文件,它在路径中搜索第一个兼容的可执行文件。

You can change python to python3. 你可以将python改为python3。 Change the env to directly use hardcoded the python3 binary. 更改env直接使用硬编码的python3二进制文件。 Or execute the binary directly with python3 and omit the shebang line. 或者直接用python3执行二进制文件并省略shebang行。

For Jupyter Notebook, edit spark-env.sh file as shown below from command line 对于Jupyter Notebook,请从命令行编辑spark-env.sh文件,如下所示

$ vi $SPARK_HOME/conf/spark-env.sh

Goto the bottom of the file and copy paste these lines 转到文件的底部并复制粘贴这些行

export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

Then, simply run following command to start pyspark in notebook 然后,只需运行以下命令即可在笔记本中启动pyspark

$ pyspark

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM