简体   繁体   English

在Google AI平台上使用Scikit学习获取预测时出现问题:“ numpy.ndarray”对象没有属性“ lower”

[英]Problem getting predictions with Scikit-learn on Google AI Platform: 'numpy.ndarray' object has no attribute 'lower'"

i'm fairly new to machine learning in general and want to store my model in the cloud in order to make online predictions. 我一般来说对机器学习还很陌生,并且想将我的模型存储在云中以便进行在线预测。

I successfully trained a Logistic Regression model with TfIdf vecotrizer (for Sentiment Analysis) on Scikit-learn locally using Jupyter Notebook and on Google AI Platform using their Training Job feature. 我在本地使用Jupyter Notebook在Scikit-learn上使用TfIdf vecotrizer(用于情感分析)成功训练了Logistic回归模型,并在Google AI平台上使用了他们的Training Job功能在其上进行了训练。

I must mention that i included bs4, nltk, lxml in my training package setup.py file as the required PyPI packages. 我必须提到,在我的培训包setup.py文件中包括bs4,nltk,lxml作为必需的PyPI包。

My training algorithm goes like this: 我的训练算法如下:

  1. Imported a CSV file of input strings and their labels (output) as a pandas dataframe (the model has 1 input variable, which is the string.) 将输入字符串及其标签(输出)的CSV文件作为pandas数据框导入(模型具有1个输入变量,即字符串)。

  2. Preprocess the input strings using bs4 and nltk to remove unnecessary characters, stopwords, and make all the characters lowercase (to reproduce this simply use lowercase alphabet-only strings). 使用bs4和nltk预处理输入字符串,以删除不必要的字符,停用词,并将所有字符都转换为小写(要重现此内容,只需使用仅使用小写字母的字符串)。

  3. Create a pipeline 创建管道

     from sklearn.feature_extraction.text import TfidfVectorizer tvec=TfidfVectorizer() lclf = LogisticRegression(fit_intercept = False, random_state = 255, max_iter = 1000) from sklearn.pipeline import Pipeline model_1= Pipeline([('vect',tvec),('clf',lclf)]) 
  4. Do a cross-validation using GridSearchCV 使用GridSearchCV进行交叉验证

     from sklearn.model_selection import GridSearchCV param_grid = [{'vect__ngram_range' : [(1, 1)], 'clf__penalty' : ['l1', 'l2'], 'clf__C' : [1.0, 10.0, 100.0]}, {'vect__ngram_range' : [(1, 1)], 'clf__penalty' : ['l1', 'l2'], 'clf__C' : [1.0, 10.0, 100.0], 'vect__use_idf' : [False], 'vect__norm' : [False]}] gs_lr_tfidf = GridSearchCV(model_1, param_grid, scoring='accuracy', cv=5, verbose=1, n_jobs=-1) gs_lr_tfidf.fit(X_train, y_train) 
  5. Get my desired model with the best estimation. 用最佳估计获得我想要的模型。 This is the model saved in the Google model.joblib file. 这是保存在Google model.joblib文件中的模型。

     clf = gs_lr_tfidf.best_estimator_ 

I can output a simple prediction on my Jupyter Notebook file using 我可以使用以下命令在Jupyter Notebook文件上输出一个简单的预测

predicted = clf.predict(["INPUT STRING"])
print(predicted)

It prints the predicted label for my input string. 它为我的输入字符串打印预测的标签。 Such as ['good'] or ['bad'] 例如['好']或['坏']

But while the model was successfully trained and submitted to the AI Platform, when i try to request a prediction such as (in the required JSON format): 但是,尽管模型已成功训练并提交给AI平台,但是当我尝试请求诸如(以所需JSON格式)的预测时:

["the quick brown fox jumps over the lazy dog"]
["hi what is up"]

The shell returns with this error: 外壳程序返回此错误:

{
  "error": "Prediction failed: Exception during sklearn prediction: 
  'numpy.ndarray' object has no attribute 'lower'"
}

What could have possibly gone wrong here? 这里可能出了什么问题?

Is this possibly a problem with the dependencies, that i too must install packages for bs4, lxml and nltk in my google-hosted model? 这可能是依赖关系的问题,我也必须在我的Google托管模型中安装bs4,lxml和nltk的软件包吗?

Or is my input JSON incorrectly formatted? 还是我的输入JSON格式错误?

Thanks for your help. 谢谢你的帮助。

Alright, i found out that indeed the JSON format is incorrectly formatted. 好吧,我发现确实JSON格式的格式错误。 (answered on https://stackoverflow.com/a/51693619/10570541 ) (在https://stackoverflow.com/a/51693619/10570541上回答)

As with the official documentation states that the JSON format has newlines and square brackets to separate instances, such with: 如官方文档所述,JSON格式具有换行符和方括号以分隔实例,例如:

[6.8,  2.8,  4.8,  1.4]
[6.0,  3.4,  4.5,  1.6]

That applies if you have more than one input variable. 如果您有多个输入变量,则适用。

For one input variable only, simply use just newlines. 仅对于一个输入变量,只需使用换行符即可。

"the quick brown fox jumps over the lazy dog"
"alright it works"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SciKit-Learn CustomTransformer: TypeError: 'numpy.ndarray' object 不可调用 - SciKit-Learn CustomTransformer: TypeError: 'numpy.ndarray' object is not callable 'numpy.ndarray' 对象没有属性 'lower' - 'numpy.ndarray' object has no attribute 'lower' CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower' - CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower' 如何解决“AttributeError:'numpy.ndarray'对象没有属性'lower'”? - How solved "AttributeError: 'numpy.ndarray' object has no attribute 'lower'"? AttributeError: 'numpy.ndarray' object 没有属性 'lower' - AttributeError: 'numpy.ndarray' object has no attribute 'lower' 在 word tokenizer 中出现错误“AttributeError: 'numpy.ndarray' object has no attribute 'lower'” - Getting error "AttributeError: 'numpy.ndarray' object has no attribute 'lower' " in word tokenizer 'numpy.ndarray' 对象没有属性 'values' - 'numpy.ndarray' object has no attribute 'values' 'numpy.ndarray'对象没有属性'mode' - 'numpy.ndarray' object has no attribute 'mode' 'numpy.ndarray' 对象没有属性 'imshow' - 'numpy.ndarray' object has no attribute 'imshow' 'numpy.ndarray'对象没有属性 - 'numpy.ndarray' object has no attribute
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM