简体   繁体   English

使用带有 pyspark 的 Jupyter Notebook 错误 no import named numpy

[英]Using Jupyter Notebook with pyspark error no import named numpy

As explained, I'm using pyspark in jupyter notebook.如前所述,我在 jupyter 笔记本中使用 pyspark。 I'm getting the attached errors.我收到附加的错误。

I have a tf-idf;我有一个 tf-idf; I normalize it;我把它标准化; then this last step creates a cosine-similarity matrix for documents.然后最后一步为文档创建一个余弦相似度矩阵。

from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix

mat = IndexedRowMatrix(
data.select("V2", "norm")\
    .rdd.map(lambda row: IndexedRow(row.ID, row.norm.toArray()))).toBlockMatrix()

But this is the error I'm getting: at the bottom it says "no module named numpy"但这是我得到的错误:在底部它说“没有名为 numpy 的模块”

2022-08-25 15:16:26,161 WARN scheduler.DAGScheduler: Broadcasting large task binary with size 4.0 MiB
2022-08-25 15:16:27,561 ERROR executor.Executor: Exception in task 0.0 in stage 8.0 (TID 15)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 71, in read_command
command = serializer._read_with_length(file)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/mllib/__init__.py", line 26, in <module>
  import numpy
ModuleNotFoundError: No module named 'numpy'

at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:713)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:695)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)

Oddly, numpy is installed correctly.奇怪的是,numpy 安装正确。 So that's the issue: numpy is installed correctly but pyspark isn't able to find it while creating a cosine-similarity matrix in a jupyte notebook.这就是问题所在:numpy 安装正确,但 pyspark 在 jupyte 笔记本中创建余弦相似度矩阵时无法找到它。

Thank you for considering this.感谢您考虑这一点。

If you are running Jupyter Notebook as an application within AWS EMR, try using a bootstrap script which installs the required version of numpy while provisioning the cluster如果您在 AWS EMR 中将 Jupyter Notebook 作为应用程序运行,请尝试使用引导脚本,该脚本在预置集群时安装所需版本的 numpy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 jupyter 笔记本导入错误:没有名为“matplotlib”的模块 - jupyter notebook import error: no module named 'matplotlib' Keras导入错误:Jupyter Notebook Anaconda没有名为“ google”的模块 - Keras Import Error: No module named 'google', Jupyter Notebook Anaconda 使用 Anaconda - 可以将 numpy 导入 Jupyterlab 但不能导入 Jupyter 笔记本 - Using Anaconda - Can import numpy into Jupyterlab but not Jupyter notebook ModuleNotFoundError:没有名为“numpy”的模块 - Jupyter Notebook - ModuleNotFoundError: No module named 'numpy' - Jupyter Notebook 在 Jupyter 笔记本上使用 pyspark.sql.function 时出错 - Error while using pyspark.sql.function on Jupyter notebook 在 jupyter notebook 中出现导入错误? - Getting import error in jupyter notebook? 在Jupyter笔记本中导入Tensorflow中的错误 - Import Error in Tensorflow in Jupyter Notebook 无法使用 python3 或 python 命令行导入“numpy”或 pandas,但能够在 jupyter notebook 中导入包 - unable to import "numpy" or pandas using python3 or python commandline, but able to import the packages in jupyter notebook Pyspark Shell 中的 HiveMetaStore 错误,但在 Jupyter Notebook 中没有 - HiveMetaStore Error in Pyspark Shell but not in Jupyter Notebook 导入错误:没有名为 numpy 的模块 - 使用 Jupyter Notebook 时的 Google Cloud Dataproc - ImportError: No module named numpy - Google Cloud Dataproc when using Jupyter Notebook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM