简体繁体 English

如何转换Scikit Learn OneVsRestClassifier预测方法输出到谷云ML的密集阵列？

[英]How to convert Scikit Learn OneVsRestClassifier predict method output to dense array for google cloud ML?

原文 2019-03-26 21:27:09 4 1 python/ scikit-learn/ google-cloud-ml

I have a model that I've trained using a sklearn Pipeline and the OneVsRestClassifier that I'm trying to deploy to Cloud ML Engine, however when I use the command: 我有一个模型，我使用sklearn Pipeline和我试图部署到Cloud ML Engine的OneVsRestClassifier进行训练，但是当我使用命令时：

gcloud ml-engine predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_FILE

I receive the error: 我收到错误：

{ "error": "Prediction failed: Bad output type returned.The predict function should return either a numpy ndarray or a list." }

This leads me to believe it is the fact that the OneVsRestClassifier's predict method output is a sparse matrix, when it should be a numpy array. 这让我相信OneVsRestClassifier的预测方法输出是一个稀疏矩阵，而它应该是一个numpy数组。 How can I convert it's output to a dense array in my Pipeline? 如何将其输出转换为我的管道中的密集数组？

The pipeline's architecture looks like this: 管道的架构如下所示：

Pipeline([('tfidf', tfidf), ('clf', OneVsRestClassifier(XGBClassifier())])

Thanks! 谢谢！

I've tried using the methods here ( Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()' ) to overwrite the OneVsRestClassifier's predict method with it's predict_proba method, however this results in the following error when I try and pickle the new pipeline: 我尝试使用这里的方法（ Google Cloud ML-engine scikit-learn预测概率'predict_proba（）' ）用它的predict_proba方法覆盖OneVsRestClassifier的预测方法，但是当我尝试挑选新的方法时会导致以下错误管道：

PicklingError: Can't pickle <function OneVsRestClassifier.predict_proba at 0x10a8f9d08>: it's not the same object as sklearn.multiclass.OneVsRestClassifier.predict_proba

1 个解决方案

AI Platform (formerly known as Cloud Machine Learning Engine) serves your model and expects the input and the output to be json-serializable. AI平台（以前称为Cloud Machine Learning Engine）为您的模型提供服务，并期望输入和输出为json可序列化。 If your model returns a sparse matrix, then you need to convert it to a dense matrix (see this for more information). 如果模型返回稀疏矩阵，则需要将其转换为密集矩阵（有关详细信息，请参阅此内容）。

If you choose to overwrite predict_proba , then you are deploying your model with some custom code (your code that overwrites the function). 如果您选择覆盖predict_proba ，那么您将使用一些自定义代码（您的代码覆盖该函数）部署您的模型。 You will then need to package up your custom code and pass it alongside your model when you deploy your model. 然后，在部署模型时，您需要打包自定义代码并将其与模型一起传递。 For more information on how to deploy models with custom code, please visit Custom prediction routines on AI Platform. 有关如何使用自定义代码部署模型的更多信息，请访问AI Platform上的自定义预测例程。