I have more than 2500 samples on which static analysis has been performed, with more than 300 features extracted per sample.
Among these samples, I have discriminated more than 10 APT
class and my aim is to build, for each class, a one-class classifier.
I'm using python scikit library for machine-learning, and in particular i'm facing with One-class SVM.
First question: There exist some other good one-class classifier for this approach?
Second question: I have to come up with some metrics that can define a sort of "accuracy" of the classifier. Now I know that for one-class SVM the accuracy concept is not so well-define. I report my code and my concept:
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
df = pd.read_csv('features_labeled_apt17.csv')
X = df.ix[:,1:341].values
X_train, X_test = train_test_split(X,test_size = 0.3,random_state = 42)
clf = svm.OneClassSVM(nu=0.1,kernel = "linear", gamma =0.1)
y_score = clf.fit(X_train)
pred = clf.predict(X_test)
print(pred)
These represents the output of the code:
[ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 -1 1 1 1
1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1]
The 1 represent of course the well-labeled sample, while the -1 represent the wrong one.
First: do you think this can be a good approach? Second: For metrics, if I divide the total element in the testing set by the wrong labeled?
In my understanding in machine learning algorithms, your use case is not a good one to apply oneclass-SVM classifier.
Normally, oneclass-svm is used for Unsupervised Outlier Detection problems. Refer this page to see the implementation of oneclass-svm to detect outliers.
Just display your data-frame, I will find any new approach to solve your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.