如何在Spark中为python 3.5安装numpy和pandas？

Question

I am trying to run a linear regression in Spark using Python 3.5 instead of Python 2.7. 我正在尝试使用Python 3.5而不是Python 2.7在Spark中运行线性回归。 So first I exported PYSPARK_PHTHON=python3. 所以首先我导出了PYSPARK_PHTHON = python3。 I received an error "No module named numpy". 我收到错误“没有名为numpy的模块”。 I tried to "pip install numpy" but pip doesn't recognize the setting PYSPARK_PYTHON. 我尝试“点安装numpy”，但点无法识别设置PYSPARK_PYTHON。 How to I ask pip to install numpy for 3.5? 我如何要求pip安装numpy for 3.5？ Thank you ... 谢谢 ...

$ export PYSPARK_PYTHON=python3

$ spark-submit linreg.py
....
Traceback (most recent call last):
  File "/home/yoda/Code/idenlink-examples/test22-spark-linreg/linreg.py", line 115, in <module>
from pyspark.ml.linalg import Vectors
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 21, in <module>
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in <module>
  ImportError: No module named 'numpy'

$ pip install numpy
Requirement already satisfied: numpy in /home/yoda/.local/lib/python2.7/site-packages

$ pyspark
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
17/02/09 20:29:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/09 20:29:20 WARN Utils: Your hostname, yoda-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
17/02/09 20:29:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/09 20:29:31 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Python version 3.5.2 (default, Nov 17 2016 17:05:23)
SparkSession available as 'spark'.
>>> import site; site.getsitepackages()
['/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.5/dist-packages']
>>>

Answer 1

So I don't actually see this as a spark question at all. 因此，我实际上根本不认为这是一个火花问题。 It looks to me like you need help with environments. 在我看来，您需要环境方面的帮助。 As the commenter mentioned you need to setup a python 3 environment, activate it, and then install numpy. 正如评论者所提到的，您需要设置一个python 3环境，将其激活，然后安装numpy。 Take a look at this for a little help on working with environments. 看看这有关使用环境中工作的一点点帮助。 After setting up a python3 environment you should activate it and then run pip install numpy or conda install numpy and you should be good to go. 设置python3环境后，您应该将其激活，然后运行pip install numpy或conda install numpy ，您应该一切顺利。

Answer 2

If you are running job local you just need to upgrade pyspark 如果您在local运行作业，则只需升级pyspark

Homebrew: brew upgrade pyspark this should solve most of the dependencies. 自制： brew upgrade pyspark这应该可以解决大多数依赖性。

如何在Spark中为python 3.5安装numpy和pandas？

问题描述

2 个解决方案

解决方案1
0 2017-02-11 06:13:06

解决方案2
0 2018-12-21 19:00:19

如何在Spark中为python 3.5安装numpy和pandas？

问题描述

2 个解决方案

解决方案1 0 2017-02-11 06:13:06

解决方案2 0 2018-12-21 19:00:19

解决方案1
0 2017-02-11 06:13:06

解决方案2
0 2018-12-21 19:00:19