簡體   English   中英

SVM ValueError:輸入包含 NaN、無窮大或對於 dtype('float64') 來說太大的值

[英]SVM ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

請幫我解決這個問題! 當我嘗試輸入一些文本以檢測分類時,我不知道為什么會發生此錯誤。

這是我訓練數據的代碼。 如何解決?

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)

from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)

if request.method == 'POST':
    message = request.form['message']
    data = [message]
    vect = vectorizer.transform(data).toarray()
    my_prediction = clf.predict(vect)

return render_template('result.html',prediction = my_prediction)`

  1. 使用your_data.isnull().any()檢查數據中是否your_data.isnull().any()值。 如果您your_data = your_data.dropna()值,請使用your_data = your_data.dropna()

  2. 使用np.isfinite(your_data)檢查您的數據是否包含 inf。 如果有 inf 值,你可以使用your_data.replace([np.inf, -np.inf], np.nan)然后your_data = your_data.dropna()來刪除它們。

    your_data更改為您正在使用的數據幀的任何名稱,fe XyX_train_tfidf

另外,請檢查此答案以及在帖子評論中標記為可能重復的答案


編輯:按需添加樣本。 在 X 和 y 上做這件事是最明顯的事情。

from sklearn.model_selection import train_test_split
# Add these lines
X = X.replace([np.inf, -np.inf], np.nan)
y = y.replace([np.inf, -np.inf], np.nan)
X = X.dropna()
y = y.dropna()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)

from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)

if request.method == 'POST':
    message = request.form['message']
    data = [message]
    vect = vectorizer.transform(data).toarray()
    my_prediction = clf.predict(vect)

return render_template('result.html',prediction = my_prediction)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM