findspark.init() IndexError：列表索引超出范围错误

Question

When running the following in a Python 3.5 Jupyter environment I get the error below.在 Python 3.5 Jupyter 环境中运行以下命令时，出现以下错误。 Any ideas on what is causing it?关于是什么原因的任何想法？

import findspark
findspark.init()

Error:错误：

IndexError                                Traceback (most recent call
last) <ipython-input-20-2ad2c7679ebc> in <module>()
      1 import findspark
----> 2 findspark.init()
      3 
      4 import pyspark

/.../anaconda/envs/pyspark/lib/python3.5/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
    132     # add pyspark to sys.path
    133     spark_python = os.path.join(spark_home, 'python')
--> 134     py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]
    135     sys.path[:0] = [spark_python, py4j]
    136 

IndexError: list index out of range

Answer 1

This is most likely due to the SPARK_HOME environment variable not being set correctly on your system.这很可能是由于SPARK_HOME环境变量未在您的系统上正确设置。 Alternatively, you can just specify it when you're initialising findspark , like so:或者，您可以在初始化findspark时指定它，如下所示：

import findspark
findspark.init('/path/to/spark/home')

After that, it should all work!之后，它应该一切正常！

Answer 2

I was getting the same error and was able to make it work by entering the exact installation directory:我遇到了同样的错误，并且能够通过输入确切的安装目录使其工作：

import findspark
# Use this
findspark.init("C:\Users\PolestarEmployee\spark-1.6.3-bin-hadoop2.6")
# Test
from pyspark import SparkContext, SparkConf

Basically, it is the directory where spark was extracted.基本上，它是提取 spark 的目录。 In future where ever you see spark_home enter the same installation directory.将来你看到spark_home地方都输入相同的安装目录。 I also tried using toree to create a kernal instead, but it is failing somehow.我也尝试使用 toree 来创建内核，但它以某种方式失败了。 A kernal would be a cleaner solution.内核将是一个更清洁的解决方案。

Answer 3

You need to update the SPARK_HOME variable inside bash_profile.您需要更新SPARK_HOME变量。 For me, the following command worked(in terminal):对我来说，以下命令有效（在终端中）：

export SPARK_HOME="/usr/local/Cellar/apache-spark/2.2.0/libexec/"

After this, you can use follow these commands:在此之后，您可以使用以下命令：

import findspark
findspark.init('/usr/local/Cellar/apache-spark/2.2.0/libexec')

Answer 4

maybe this could help:也许这可以帮助：

i found that findspark.init() tries to find data in .\\spark-3.0.1-bin-hadoop2.7\\bin\\python\\lib, but the python folder was outside the bin folder.我发现 findspark.init() 试图在 .\\spark-3.0.1-bin-hadoop2.7\\bin\\python\\lib 中查找数据，但 python 文件夹在 bin 文件夹之外。 i simply ran findspark.init('.\\spark-3.0.1-bin-hadoop2.7'), without the '\\bin' folder我只是跑了 findspark.init('.\\spark-3.0.1-bin-hadoop2.7')，没有 '\\bin' 文件夹

findspark.init() IndexError：列表索引超出范围错误

问题描述

4 个解决方案

解决方案1
20 已采纳 2017-03-17 05:27:04

解决方案2
9 2017-04-05 10:50:57

解决方案3
3 2017-09-08 15:30:38

解决方案4
0 2020-09-29 22:09:58

findspark.init() IndexError：列表索引超出范围错误

问题描述

4 个解决方案

解决方案1 20 已采纳 2017-03-17 05:27:04

解决方案2 9 2017-04-05 10:50:57

解决方案3 3 2017-09-08 15:30:38

解决方案4 0 2020-09-29 22:09:58

解决方案1
20 已采纳 2017-03-17 05:27:04

解决方案2
9 2017-04-05 10:50:57

解决方案3
3 2017-09-08 15:30:38

解决方案4
0 2020-09-29 22:09:58