簡體   English   中英

使用帶有 pyspark 的 Jupyter Notebook 錯誤 no import named numpy

[英]Using Jupyter Notebook with pyspark error no import named numpy

如前所述,我在 jupyter 筆記本中使用 pyspark。 我收到附加的錯誤。

我有一個 tf-idf; 我把它標准化; 然后最后一步為文檔創建一個余弦相似度矩陣。

from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix

mat = IndexedRowMatrix(
data.select("V2", "norm")\
    .rdd.map(lambda row: IndexedRow(row.ID, row.norm.toArray()))).toBlockMatrix()

但這是我得到的錯誤:在底部它說“沒有名為 numpy 的模塊”

2022-08-25 15:16:26,161 WARN scheduler.DAGScheduler: Broadcasting large task binary with size 4.0 MiB
2022-08-25 15:16:27,561 ERROR executor.Executor: Exception in task 0.0 in stage 8.0 (TID 15)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 601, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 71, in read_command
command = serializer._read_with_length(file)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/Cellar/apache-spark/3.2.1/libexec/python/lib/pyspark.zip/pyspark/mllib/__init__.py", line 26, in <module>
  import numpy
ModuleNotFoundError: No module named 'numpy'

at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:555)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:713)
at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:695)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:508)

奇怪的是,numpy 安裝正確。 這就是問題所在:numpy 安裝正確,但 pyspark 在 jupyte 筆記本中創建余弦相似度矩陣時無法找到它。

感謝您考慮這一點。

如果您在 AWS EMR 中將 Jupyter Notebook 作為應用程序運行,請嘗試使用引導腳本,該腳本在預置集群時安裝所需版本的 numpy

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM