Cloud ML Engine和Scikit-Learn：'LatentDirichletAllocation'对象没有属性'predict'

Question

I'm implementing simple Scikit-Learn Pipeline to perform LatentDirichletAllocation in Google Cloud ML Engine. 我正在实施简单的Scikit-Learn Pipeline以在Google Cloud ML Engine中执行LatentDirichletAllocation 。 Goal is to predict topics from new data. 目标是从新数据预测主题。 Here is the code for generating pipeline: 以下是生成管道的代码：

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups

dataset = fetch_20newsgroups(shuffle=True, random_state=1,
                             remove=('headers', 'footers', 'quotes'))
train, test = train_test_split(dataset.data[:2000])

pipeline = Pipeline([
    ('CountVectorizer', CountVectorizer(
        max_df          = 0.95,
        min_df          = 2,
        stop_words      = 'english')),
    ('LatentDirichletAllocation', LatentDirichletAllocation(
        n_components    = 10,
        learning_method ='online'))
])

pipeline.fit(train)

Now (if I have understood correctly) to predict topics for test data I can run: 现在（如果我已经正确理解）预测测试数据的主题我可以运行：

pipeline.transform(test)

However, when uploading pipeline to Google Cloud Storage and trying to use it to produce local predictions with Google Cloud ML Engine I get error that says LatentDirichletAllocation has no attribute predict . 但是，在将管道上传到Google云端存储并尝试使用它来使用Google Cloud ML Engine生成本地预测时，我会收到错误消息，指出LatentDirichletAllocation没有属性predict 。

gcloud ml-engine local predict \
    --model-dir=$MODEL_DIR \
    --json-instances $INPUT_FILE \
    --framework SCIKIT_LEARN
...
"Exception during sklearn prediction: " + str(e)) cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during sklearn prediction: 'LatentDirichletAllocation' object has no attribute 'predict' (Error code: 2)

Lack of predict-method can be seen also from docs, so I guess this isn't the way to go with this. 从文档中也可以看到缺乏预测方法，所以我想这不是解决这个问题的方法。 http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html

Now the question is: What is the way to go? 现在的问题是：要走的路是什么？ How to use LatentDirichletAllocation (or similar) in Scikit-Learn Pipelines with Google Cloud ML Engine? 如何在Scikit-Learn管道中使用LatentDirichletAllocation （或类似）与Google Cloud ML Engine？

Answer 1

目前，管道的最后一个估算器必须实现predict方法。

Cloud ML Engine和Scikit-Learn：'LatentDirichletAllocation'对象没有属性'predict'

问题描述

1 个解决方案

解决方案1
3 2018-07-23 20:47:10

Cloud ML Engine和Scikit-Learn：&#39;LatentDirichletAllocation&#39;对象没有属性&#39;predict&#39;

问题描述

1 个解决方案

解决方案1 3 2018-07-23 20:47:10

Cloud ML Engine和Scikit-Learn：'LatentDirichletAllocation'对象没有属性'predict'

解决方案1
3 2018-07-23 20:47:10