I'm implementing simple Scikit-Learn Pipeline
to perform LatentDirichletAllocation
in Google Cloud ML Engine. Goal is to predict topics from new data. Here is the code for generating pipeline:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups
dataset = fetch_20newsgroups(shuffle=True, random_state=1,
remove=('headers', 'footers', 'quotes'))
train, test = train_test_split(dataset.data[:2000])
pipeline = Pipeline([
('CountVectorizer', CountVectorizer(
max_df = 0.95,
min_df = 2,
stop_words = 'english')),
('LatentDirichletAllocation', LatentDirichletAllocation(
n_components = 10,
learning_method ='online'))
])
pipeline.fit(train)
Now (if I have understood correctly) to predict topics for test data I can run:
pipeline.transform(test)
However, when uploading pipeline to Google Cloud Storage and trying to use it to produce local predictions with Google Cloud ML Engine I get error that says LatentDirichletAllocation
has no attribute predict
.
gcloud ml-engine local predict \
--model-dir=$MODEL_DIR \
--json-instances $INPUT_FILE \
--framework SCIKIT_LEARN
...
"Exception during sklearn prediction: " + str(e)) cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during sklearn prediction: 'LatentDirichletAllocation' object has no attribute 'predict' (Error code: 2)
Lack of predict-method can be seen also from docs, so I guess this isn't the way to go with this. http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
Now the question is: What is the way to go? How to use LatentDirichletAllocation
(or similar) in Scikit-Learn Pipelines with Google Cloud ML Engine?
目前,管道的最后一个估算器必须实现predict
方法。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.