简体   繁体   English

AttributeError: 'numpy.ndarray' object 没有属性 'lower'

[英]AttributeError: 'numpy.ndarray' object has no attribute 'lower'

I am trying to predict using SVM but I receive the error我正在尝试使用 SVM 进行预测,但收到错误消息

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

when executing line text_clf.fit(X_train,y_train) of my code.在执行我的代码的行text_clf.fit(X_train,y_train)时。 How to fix this and get the probability that my prediction is correct using SVM?如何解决这个问题并使用 SVM 获得我的预测正确的概率?

I am predicting the first column (gold) of my input file based on the values of the remaining columns.我根据剩余列的值来预测输入文件的第一列(金色)。 My input file dataExtended.txt is under the form:我的输入文件dataExtended.txt格式如下:


Here is my full reproducible code:这是我的完整可重现代码:

# Make Predictions with Naive Bayes On The Iris Dataset
from sklearn.cross_validation import train_test_split 
from sklearn import metrics
import pandas as pd 
import numpy as np
import seaborn as sns; sns.set()
from sklearn.metrics import confusion_matrix 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 
import seaborn as sns
from sklearn import svm
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline

data = pd.read_csv( 'dataExtended.txt', sep= ',') 
row_count, column_count = data.shape

    # Printing the dataswet shape 
print ("Dataset Length: ", len(data)) 
print ("Dataset Shape: ", data.shape) 
print("Number of columns ", column_count)

    # Printing the dataset obseravtions 
print ("Dataset: ",data.head()) 
data['gold'] = data['gold'].astype('category').cat.codes
data['Program'] = data['Program'].astype('category').cat.codes
    # Building Phase Separating the target variable 
X = data.values[:, 1:column_count] 
Y = data.values[:, 0] 

    # Splitting the dataset into train and test 
X_train, X_test, y_train, y_test = train_test_split( 
X, Y, test_size = 0.3, random_state = 100) 
    #Create a svm Classifier
svclassifier = svm.LinearSVC()

print('Before fitting')
svclassifier.fit(X_train, y_train)
predicted = svclassifier.predict(X_test)

text_clf = Pipeline([('tfidf',TfidfVectorizer()),('clf',LinearSVC())])

Traceback leading to error:回溯导致错误:

Traceback (most recent call last):

  File "<ipython-input-9-8e85a0a9f81c>", line 1, in <module>
    runfile('C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/Paper4SVM.py', wdir='C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python')

  File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/mouna/ownCloud/Mouna Hammoudi/dumps/Python/Paper4SVM.py", line 53, in <module>

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 248, in fit
    Xt, fit_params = self._fit(X, y, **fit_params)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 213, in _fit

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\externals\joblib\memory.py", line 362, in __call__
    return self.func(*args, **kwargs)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 581, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 1381, in fit_transform
    X = super(TfidfVectorizer, self).fit_transform(raw_documents)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 869, in fit_transform

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 792, in _count_vocab
    for feature in analyze(doc):

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 266, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)

  File "C:\Users\mouna\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 232, in <lambda>
    return lambda x: strip_accents(x.lower())

You cannot use TF-IDF-related methods for numeric data;不能对数值数据使用 TF-IDF 相关的方法; the method is exclusively for use with text data, hence it uses methods such as .tolower() , which are by default applicable to strings, hence the error.该方法专门用于文本数据,因此它使用诸如.tolower()之类的方法,这些方法默认适用于字符串,因此会出现错误。 This is already apparent from the documentation :这在文档中已经很明显了:

fit (self, raw_documents, y=None) fit (自我,raw_documents,y=None)

Learn vocabulary and idf from training set.从训练集中学习词汇和 idf。


raw_documents: iterable raw_documents:可迭代

An iterable which yields either str, unicode or file objects.产生 str、unicode 或文件对象的迭代。

I am afraid that your rationale, as explained in the comments:正如评论中所解释的,恐怕您的理由是:

I'm just trying to get the probability that each prediction is correct and TF-IDF seems to be the only way to do so when using SVM我只是想获得每个预测正确的概率,而 TF-IDF 似乎是使用 SVM 时这样做的唯一方法

is extremely weak.极其虚弱。 For starters, there is no such thing as " the probability that each prediction is correct " - I take it that you mean probabilistic predictions , in contrast to hard class predictions (see Predict classes or class probabilities? )对于初学者来说,没有“每个预测正确的概率”之类的东西 - 我认为你的意思是概率预测,与硬 class 预测相反(请参阅预测类或 class 概率?

To get to the point of your actual requirement: in contrast to LinearSVC , which you are using here, SVC does indeed provide a predict_proba method, which should do the job (see the docs and the instructions therein).为了达到您的实际要求:与您在此处使用的LinearSVC相比, SVC确实提供了一个predict_proba方法,它应该可以完成这项工作(请参阅文档和其中的说明)。 Notice that LinearSVC is not actually an SVM - see answer in Under what parameters are SVC and LinearSVC in scikit-learn equivalent?请注意,LinearSVC 实际上不是SVM - 请参阅Under what parameters are SVC and LinearSVC in scikit-learn 中的答案? for details.详情。

In short, forget about TF-IDF and switch to SVC instead of LinearSVC.简而言之,忘记 TF-IDF 并切换到 SVC 而不是 LinearSVC。


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower' - CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower' 如何解决“AttributeError:&#39;numpy.ndarray&#39;对象没有属性&#39;lower&#39;”? - How solved "AttributeError: 'numpy.ndarray' object has no attribute 'lower'"? 'numpy.ndarray' 对象没有属性 'lower' - 'numpy.ndarray' object has no attribute 'lower' AttributeError:“ numpy.ndarray”对象没有属性“ A” - AttributeError: 'numpy.ndarray' object has no attribute 'A' AttributeError: &#39;numpy.ndarray&#39; 对象没有属性 &#39;lower&#39; 拟合逻辑模型数据 - AttributeError: 'numpy.ndarray' object has no attribute 'lower' fitting logistic model data 在 word tokenizer 中出现错误“AttributeError: &#39;numpy.ndarray&#39; object has no attribute &#39;lower&#39;” - Getting error "AttributeError: 'numpy.ndarray' object has no attribute 'lower' " in word tokenizer AttributeError:“ numpy.ndarray”对象在tockenizer_left.texts_to_sequences(x_left)中没有属性“ lower” - AttributeError: 'numpy.ndarray' object has no attribute 'lower' in tockenizer_left.texts_to_sequences(x_left) 我如何通过消除错误来训练管道中的GaussianNB [AttributeError:&#39;numpy.ndarray&#39;对象没有属性&#39;lower&#39;] - How can i train GaussianNB in pipeline by removing error[AttributeError: 'numpy.ndarray' object has no attribute 'lower'] sklearn 中的 CountVectorizer 抛出“AttributeError: 'numpy.ndarray' object has no attribute 'lower'” - CountVectorizer in sklearn throws “AttributeError: 'numpy.ndarray' object has no attribute 'lower'” Numpy和Matplotlib-AttributeError:“ numpy.ndarray”对象没有属性“ replace” - Numpy and Matplotlib - AttributeError: 'numpy.ndarray' object has no attribute 'replace'
粤ICP备18138465号  © 2020-2024 STACKOOM.COM