简体   繁体   中英

One-class Classification

I have more than 2500 samples on which static analysis has been performed, with more than 300 features extracted per sample.

Among these samples, I have discriminated more than 10 APT class and my aim is to build, for each class, a one-class classifier.

I'm using python scikit library for machine-learning, and in particular i'm facing with One-class SVM.

First question: There exist some other good one-class classifier for this approach?

Second question: I have to come up with some metrics that can define a sort of "accuracy" of the classifier. Now I know that for one-class SVM the accuracy concept is not so well-define. I report my code and my concept:

import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split


df = pd.read_csv('features_labeled_apt17.csv')

X = df.ix[:,1:341].values



X_train, X_test = train_test_split(X,test_size = 0.3,random_state = 42)



clf = svm.OneClassSVM(nu=0.1,kernel = "linear", gamma =0.1)
y_score = clf.fit(X_train)

pred = clf.predict(X_test)


print(pred)

These represents the output of the code:

[ 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1  1   1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1  1 -1   1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1 -1  1   1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1 -1  1  1  1
1 1  1  1  1   1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  
1  1  1  1  1   1  1  1  1  1  1]

The 1 represent of course the well-labeled sample, while the -1 represent the wrong one.

First: do you think this can be a good approach? Second: For metrics, if I divide the total element in the testing set by the wrong labeled?

In my understanding in machine learning algorithms, your use case is not a good one to apply oneclass-SVM classifier.

Normally, oneclass-svm is used for Unsupervised Outlier Detection problems. Refer this page to see the implementation of oneclass-svm to detect outliers.

Just display your data-frame, I will find any new approach to solve your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM