简体   繁体   English

一类svm的测试集的100%错误率

[英]100% error rate on test set with one class svm

I am trying to detect outlier images. 我正在尝试检测异常图像。 But I'm getting bizarre results from the model. 但是我从模型中得到了奇怪的结果。

I've read in the images with cv2, flattened them into 1d-arrays, and turned them into a pandas dataframe and then fed that into the SVM. 我已经用cv2读入图像,将它们展平为1d数组,然后将它们转换为pandas数据框,然后将其输入到SVM中。

import numpy as np
import cv2
import glob
import pandas as pd
import sys, os
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import *
import seaborn as sns`

load the labels and files 加载标签和文件

labels_wt = np.loadtxt("labels_wt.txt", delimiter="\t", dtype="str")
files_wt = np.loadtxt("files_wt.txt", delimiter="\t", dtype="str")`

load and flatten the images 加载并展平图像

wt_images_tmp = [cv2.imread(file) for file in files_wt]
wt_images = [image.flatten() for image in wt_images_tmp]
tmp3 = np.array(wt_images)
mutant_images_tmp = [cv2.imread(file) for file in files_mut]
mutant_images = [image.flatten() for image in mutant_images_tmp]
tmp4 = np.array(mutant_images)


X = pd.DataFrame(tmp3) #load the wild-type images
y = pd.Series(labels_wt)
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=42) 
X_outliers = pd.DataFrame(tmp4)
clf = svm.OneClassSVM(nu=0.15, kernel="rbf", gamma=0.0001)
clf.fit(X_train)

Then I evaluate the results according to the sklearn tutorial on oneclass SVM. 然后,根据oneclass SVM的sklearn教程评估结果。

y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
n_error_train = y_pred_train[y_pred_train == -1].size
n_error_test = y_pred_test[y_pred_test == -1].size
n_error_outliers = y_pred_outliers[y_pred_outliers == 1].size

print(n_error_train / len(y_pred_train))
print(float(n_error_test) / float(len(y_pred_test)))
print(n_error_outliers / len(y_pred_outliers))`

my error rates on the training set have been variable (10-30%), but on the test set, they have never gone below 100%. 我在训练集上的错误率是可变的(10-30%),但是在测试集上,它们从未低于100%。 Am I doing this wrong? 我做错了吗?

My guess is that you are setting random_state = 42 , this is biasing your train_test_split to always have the same splitting pattern. 我的猜测是,您正在设置random_state = 42 ,这train_test_split您的train_test_split始终具有相同的分割模式。 You can read more about it in this answer. 您可以在答案中了解更多信息。 Don't specify any state and run the code again, so: 不要指定任何状态,然后再次运行代码,因此:

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2)

This will show different results. 这将显示不同的结果。 Once you are sure this works, make sure yo then do cross-validation , possibly using k-fold validation. 一旦确定此方法有效,请确保随后进行交叉验证 ,可能使用k折验证。 Let us know if this helps. 让我们知道是否有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM