繁体   English   中英

无法在Spark中导入名称LDA MLlib

[英]Cannot import name LDA MLlib in Spark

我正在尝试使用Spark实现LDA并收到此错误。 我是Spark的新手,所以能提供任何帮助。

[root@sandbox ~]# spark-submit ./lda.py
Traceback (most recent call last):
  File "/root/./lda.py", line 3, in <module>
    from pyspark.mllib.clustering import LDA, LDAModel
ImportError: cannot import name LDA

这是代码:

from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors
import numpy
sc = SparkContext(appName="PythonLDA")
data = sc.textFile("/tutorial/input/askreddit20150801.txt")
parsedData = data.map(lambda line: Vectors.dense([float(x) for x in line.strip().split(' ')]))
# Index documents with unique IDs
corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()

# Cluster the documents into three topics using LDA
ldaModel = LDA.train(corpus, k=3)

# Output topics. Each is a distribution over words (matching word count vectors)
print("Learned topics (as distributions over vocab of " + str(ldaModel.vocabSize()) + " words):")
topics = ldaModel.topicsMatrix()
for topic in range(3):
    print("Topic " + str(topic) + ":")
    for word in range(0, ldaModel.vocabSize()):
        print(" " + str(topics[word][topic]))

# Save and load model
model.save(sc, "myModelPath")
sameModel = LDAModel.load(sc, "myModelPath")

当我尝试安装pyspark.mllib.clustering时:

[root@sandbox ~]# pip install spark.mllib.clustering
Collecting spark.mllib.clustering
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Could not find a version that satisfies the requirement spark.mllib.clustering (from versions: )
No matching distribution found for spark.mllib.clustering

Spark 1.5.0中引入了用于LDA的PySpark包装器。 假设您的安装未损坏,则可能使用Spark <=1.4.x。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM