[英]ValueError: Input contains NaN, infinity or a value too large for dtype('float64') while preprocessing Data
[英]SVM ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
請幫我解決這個問題! 當我嘗試輸入一些文本以檢測分類時,我不知道為什么會發生此錯誤。
這是我訓練數據的代碼。 如何解決?
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)
if request.method == 'POST':
message = request.form['message']
data = [message]
vect = vectorizer.transform(data).toarray()
my_prediction = clf.predict(vect)
return render_template('result.html',prediction = my_prediction)`
使用your_data.isnull().any()
檢查數據中是否your_data.isnull().any()
值。 如果您your_data = your_data.dropna()
值,請使用your_data = your_data.dropna()
。
使用np.isfinite(your_data)
檢查您的數據是否包含 inf。 如果有 inf 值,你可以使用your_data.replace([np.inf, -np.inf], np.nan)
然后your_data = your_data.dropna()
來刪除它們。
將your_data
更改為您正在使用的數據幀的任何名稱,fe X
、 y
或X_train_tfidf
編輯:按需添加樣本。 在 X 和 y 上做這件事是最明顯的事情。
from sklearn.model_selection import train_test_split
# Add these lines
X = X.replace([np.inf, -np.inf], np.nan)
y = y.replace([np.inf, -np.inf], np.nan)
X = X.dropna()
y = y.dropna()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
from sklearn.svm import LinearSVC
clf = LinearSVC()
clf.fit(X_train_tfidf,y_train)
if request.method == 'POST':
message = request.form['message']
data = [message]
vect = vectorizer.transform(data).toarray()
my_prediction = clf.predict(vect)
return render_template('result.html',prediction = my_prediction)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.